Effective representations for leveraging language content in multimedia event detection

Author

Shuang Wu ; Xiaodan Zhuang ; Natarajan, Prem

Author_Institution

Speech, Language & Multimedia Bus. Unit, Raytheon BBN Technol., Cambridge, MA, USA

fYear

2014

fDate

4-9 May 2014

Firstpage

7123

Lastpage

7127

Abstract

Language content in videos from speech and overlaid or inscene video text can provide high precision signals for video event detection and retrieval. However, sporadic occurrence, content that is unrelated to the events of interest, and high error rates of current speech and text recognition systems on consumer domain video make it difficult to exploit these channels. In this paper, we study different representations of language content to address these challenges. First, we utilize likelihood weighted word lattices obtained from a Hidden Markov Model (HMM) based decoding engine to encode many alternate hypotheses, rather than relying on noisy single best hypotheses. Second, we utilize an event-independent modified term frequency-inverse document frequency (TF-IDF) weighting scheme to obtain the final feature vector. We present detailed experimental results on the TRECVID MED 2013 dataset containing ~150000 videos, and show that our representation significantly outperforms alternate representations for both speech and video text.

Keywords

hidden Markov models; multimedia computing; natural language processing; speech recognition; text analysis; video retrieval; video signal processing; HMM; Hidden Markov Model; inscene video text; language content representation; leveraging language content; multimedia event detection; overlaid video text; speech recognition; sporadic occurrence; term frequency inverse document frequency; text recognition systems; video event detection; video event retrieval; Event detection; Hidden Markov models; Lattices; Optical character recognition software; Speech; Videos; Vocabulary; TF-IDF; lattices; multimedia event detection; speech recognition; video text OCR;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854982

Filename

6854982