DocumentCode :
569129
Title :
Multimodal Location Estimation of Consumer Media: Dealing with Sparse Training Data
Author :
Choi, Jaeyoung ; Friedland, Gerald ; Ekambaram, Venkatesan ; Ramchandran, Kannan
Author_Institution :
Int. Comput. Sci. Inst., Berkeley, CA, USA
fYear :
2012
fDate :
9-13 July 2012
Firstpage :
43
Lastpage :
48
Abstract :
This article describes a novel approach to the problem of associating geo-locations to consumer-produced multimedia data such as videos and photos that are publicly available on social networking websites such as Flickr. We specifically focus on the case where the available training data is sparse both in absolute numbers as well as geographic coverage when compared to the number of untagged query data. We develop a novel graphical model based framework for the problem of interest and pose the problem of geotagging as one of inference over this graph. The novelty of our algorithm lies in the fact that we jointly estimate the geo-locations of all the query videos, which helps obtain performance improvements over existing algorithms in the literature that process each query video independently. Our system enables the query videos to act as "virtual" training data that effectively bootstrap the geo-tagging process. The quality of the database improves with each additional query video in the system. Further, our modeling provides a generic theoretical framework that can be used to incorporate any other available textual, visual or audio features. We evaluate our algorithm on the MediaEval 2011 Placing Task data set and show that for fixed training data the system performance improves with an increasing number of unlabeled test data. The performance gains are shown to be over 10% as compared to existing algorithms in the literature.
Keywords :
geographic information systems; graph theory; inference mechanisms; multimedia computing; query processing; social networking (online); statistical analysis; training; video signal processing; Flickr; MediaEval 2011 Placing Task data set; absolute numbers; audio features; consumer-produced multimedia data; database quality improvement; generic theoretical framework; geo-locations; geo-tagging process bootstrapping; geographic coverage; graphical model-based framework; inference mechanisms; multimodal location estimation; performance gains; performance improvements; query videos; social networking Web sites; sparse training data; textual features; unlabeled test data; virtual training data; visual features; Databases; Graphical models; Random variables; Training; Training data; Videos; Visualization; Belief Propagation; Geo-Tagging; Graphical Models; Multimodal Location Estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo (ICME), 2012 IEEE International Conference on
Conference_Location :
Melbourne, VIC
ISSN :
1945-7871
Print_ISBN :
978-1-4673-1659-0
Type :
conf
DOI :
10.1109/ICME.2012.141
Filename :
6298372
Link To Document :
بازگشت