DocumentCode
1760428
Title
Interactive Multimodal Visual Search on Mobile Device
Author
Houqiang Li ; Yang Wang ; Tao Mei ; Jingdong Wang ; Shipeng Li
Author_Institution
Univ. of Sci. & Technol. of China, Hefei, China
Volume
15
Issue
3
fYear
2013
fDate
41365
Firstpage
594
Lastpage
607
Abstract
This paper describes a novel multimodal interactive image search system on mobile devices. The system, the Joint search with ImaGe, Speech, And Word Plus (JIGSAW+ ), takes full advantage of the multimodal input and natural user interactions of mobile devices. It is designed for users who already have pictures in their minds but have no precise descriptions or names to address them. By describing it using speech and then refining the recognized query by interactively composing a visual query using exemplary images, the user can easily find the desired images through a few natural multimodal interactions with his/her mobile device. Compared with our previous work JIGSAW, the algorithm has been significantly improved in three aspects: 1) segmentation-based image representation is adopted to remove the artificial block partitions; 2) relative position checking replaces the fixed position penalty; and 3) inverted index is constructed instead of brute force matching. The proposed JIGSAW+ is able to achieve 5% gain in terms of search performance and is ten times faster.
Keywords
image representation; image retrieval; image segmentation; interactive systems; mobile computing; speech recognition; speech-based user interfaces; JIGSAW+; artificial block partition removal; exemplary images; interactive multimodal visual search; inverted index construction; joint search-with-image-speech-and-word plus; mobile devices; multimodal input user interactions; multimodal interactive image search system; natural multimodal interactions; natural user interactions; segmentation-based image representation; visual query; Databases; Internet; Mobile communication; Mobile handsets; Speech; Speech recognition; Visualization; Interactive search; mobile device; mobile visual search; multimodal search;
fLanguage
English
Journal_Title
Multimedia, IEEE Transactions on
Publisher
ieee
ISSN
1520-9210
Type
jour
DOI
10.1109/TMM.2012.2234730
Filename
6384798
Link To Document