Title :
WIKI-CMR: A web cross modality dataset for studying and evaluation of cross modality retrieval models
Author :
Wei Xiong ; Shuhui Wang ; Chunjie Zhang ; Qingming Huang
Author_Institution :
Schoold of Comput. & Control Eng., Grad. Univ. of Chinese Acad. of Sci., Beijing, China
Abstract :
With the popularity of Web multimedia data, cross-modality retrieval becomes an urgent and challenging problem. Bridging the semantic gap between different modalities and dealing with abundant data are the main challenges for cross-modality retrieval. A well-designed dataset could provide a platform for developing the state-of-the-art cross-modality retrieval algorithms. However, existing Web cross-modality datasets are small in size, or do not contain the full information, for example, the hyperlink structure. In this paper, we introduce a new Web cross-modality dataset called “WIKI-CMR” by selecting Wikipedia as the reliable and information-rich data resource, and collect data with a smart crawling strategy. This dataset is comprised of 74961 documents with textual paragraphs, images and hyperlinks. All documents are categorized into 11 semantic topics. We point out several challenges on this dataset and use this dataset to evaluate some well-known cross-modality retrieval models.
Keywords :
Internet; Web sites; hypermedia; information retrieval; multimedia computing; WIKI-CMR; Web cross modality dataset; Web multimedia data; Wikipedia; cross modality retrieval models; data resource; hyperlink structure; semantic gap; semantic topics; smart crawling strategy; state-of-the-art cross-modality retrieval algorithms; textual paragraphs; well-designed dataset; Abstracts; Electronic publishing; Encyclopedias; Internet; Multimedia communication; Robots; Multimedia; cross-modality; dataset; retrieval;
Conference_Titel :
Multimedia and Expo (ICME), 2013 IEEE International Conference on
Conference_Location :
San Jose, CA
DOI :
10.1109/ICME.2013.6607613