DocumentCode :
253844
Title :
Topic Modeling of Multimodal Data: An Autoregressive Approach
Author :
Yin Zheng ; Yu-Jin Zhang ; Larochelle, Hugo
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2014
fDate :
23-28 June 2014
Firstpage :
1370
Lastpage :
1377
Abstract :
Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the hidden topic features by incorporating label information into the training objective of the model and show how to employ SupDocNADE to learn a joint representation from image visual words, annotation words and class label information. We also describe how to leverage information about the spatial position of the visual words for SupDocNADE to achieve better performance in a simple, yet effective manner. We test our model on the LabelMe and UIUC-Sports datasets and show that it compares favorably to other topic models such as the supervised variant of LDA and a Spatial Matching Pyramid (SPM) approach.
Keywords :
autoregressive processes; document image processing; image classification; image representation; learning (artificial intelligence); neural nets; text analysis; LDA; LabelMe datasets; SPM approach; SupDocNADE; UIUC-Sports datasets; annotation words; autoregressive approach; class label information; hidden topic feature discriminative power; image annotation; image annotation tasks; image visual words; joint representation learning; latent Dirichlet allocation; multimodal data topic modeling; simultaneous image classification; spatial matching pyramid approach; supervised document neural autoregressive distribution estimator; text document modeling; Computational modeling; Data models; Equations; Joints; Mathematical model; Training; Visualization; multimodal data; neural autoregressive approach; topic modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
Conference_Location :
Columbus, OH
Type :
conf
DOI :
10.1109/CVPR.2014.178
Filename :
6909574
Link To Document :
بازگشت