DocumentCode :
1314373
Title :
Multimodal Video Indexing and Retrieval Using Directed Information
Author :
Chen, Xu ; Hero, Alfred O., III ; Savarese, Silvio
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan at Ann Arbor, Ann Arbor, MI, USA
Volume :
14
Issue :
1
fYear :
2012
Firstpage :
3
Lastpage :
16
Abstract :
We propose a novel framework for multimodal video indexing and retrieval using shrinkage optimized directed information assessment (SODA) as similarity measure. The directed information (DI) is a variant of the classical mutual information which attempts to capture the direction of information flow that videos naturally possess. It is applied directly to the empirical probability distributions of both audio-visual features over successive frames. We utilize RASTA-PLP features for audio feature representation and SIFT features for visual feature representation. We compute the joint probability density functions of audio and visual features in order to fuse features from different modalities. With SODA, we further estimate the DI in a manner that is suitable for high dimensional features p and small sample size n (large p small n ) between pairs of video-audio modalities. We demonstrate the superiority of the SODA approach in video indexing, retrieval, and activity recognition as compared to the state-of-the-art methods such as hidden Markov models (HMM), support vector machine (SVM), cross-media indexing space (CMIS), and other noncausal divergence measures such as mutual information (MI). We also demonstrate the success of SODA in audio and video localization and indexing/retrieval of data with missaligned modalities.
Keywords :
audio-visual systems; feature extraction; indexing; optimisation; statistical distributions; transforms; video retrieval; RASTA-PLP feature; SIFT feature; SODA; activity recognition; audio localization; audio-visual feature representation; classical mutual information; data indexing; data retrieval; information flow; joint probability density function; missaligned modalities; multimodal video indexing; multimodal video retrieval; probability distribution; shrinkage optimized directed information assessment; similarity measure; video localization; video-audio modalities; Hidden Markov models; Humans; Indexing; Joints; Mutual information; Support vector machines; Visualization; Audio-video pattern recognition; multimedia content retrieval; multimodal feature fusion; nonlinear information flow; overfitting prevention; shrinkage optimization;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2011.2167223
Filename :
6009223
Link To Document :
بازگشت