DocumentCode :
1341524
Title :
An Enhanced Bag-of-Visual Word Vector Space Model to Represent Visual Content in Athletics Images
Author :
Kesorn, Kraisak ; Poslad, Stefan
Author_Institution :
Comput. Sci. & Inf. Technol. Dept., Naresuan Univ., Phitsanulok, Thailand
Volume :
14
Issue :
1
fYear :
2012
Firstpage :
211
Lastpage :
222
Abstract :
Images that have a different visual appearance may be semantically related using a higher level conceptualization. However, image classification and retrieval systems tend to rely only on the low-level visual structure within images. This paper presents a framework to deal with this semantic gap limitation by exploiting the well-known bag-of-visual words (BVW) to represent visual content. The novelty of this paper is threefold. First, the quality of visual words is improved by constructing visual words from representative keypoints. Second, domain specific “non-informative visual words” are detected which are useless to represent the content of visual data but which can degrade the categorization capability. Distinct from existing frameworks, two main characteristics for non-informative visual words are defined: a high document frequency (DF) and a small statistical association with all the concepts in the collection. The third contribution in this paper is that a novel method is used to restructure the vector space model of visual words with respect to a structural ontology model in order to resolve visual synonym and polysemy problems. The experimental results show that our method can disambiguate visual word senses effectively and can significantly improve classification, interpretation, and retrieval performance for the athletics images.
Keywords :
image classification; image representation; image retrieval; natural language processing; ontologies (artificial intelligence); statistical analysis; vectors; athletic image interpretation; categorization capability; document frequency; domain specific noninformative visual word; enhanced bag of visual word vector space model; image classification; image retrieval; noninformative visual word; representative keypoints; semantic gap limitation; statistical association; structural ontology model; visual appearance; visual content; visual data content; visual polysemy problem; visual synonym problem; visual word sense disambiguation; Accuracy; Image retrieval; Noise; Principal component analysis; Semantics; Videos; Visualization; Bag-of-visual words; non-informative visual words discovery; ontology model; visual content representation; visual words disambiguation;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2011.2170665
Filename :
6035787
Link To Document :
بازگشت