• DocumentCode
    180188
  • Title

    Effective representations for leveraging language content in multimedia event detection

  • Author

    Shuang Wu ; Xiaodan Zhuang ; Natarajan, Prem

  • Author_Institution
    Speech, Language & Multimedia Bus. Unit, Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    7123
  • Lastpage
    7127
  • Abstract
    Language content in videos from speech and overlaid or inscene video text can provide high precision signals for video event detection and retrieval. However, sporadic occurrence, content that is unrelated to the events of interest, and high error rates of current speech and text recognition systems on consumer domain video make it difficult to exploit these channels. In this paper, we study different representations of language content to address these challenges. First, we utilize likelihood weighted word lattices obtained from a Hidden Markov Model (HMM) based decoding engine to encode many alternate hypotheses, rather than relying on noisy single best hypotheses. Second, we utilize an event-independent modified term frequency-inverse document frequency (TF-IDF) weighting scheme to obtain the final feature vector. We present detailed experimental results on the TRECVID MED 2013 dataset containing ~150000 videos, and show that our representation significantly outperforms alternate representations for both speech and video text.
  • Keywords
    hidden Markov models; multimedia computing; natural language processing; speech recognition; text analysis; video retrieval; video signal processing; HMM; Hidden Markov Model; inscene video text; language content representation; leveraging language content; multimedia event detection; overlaid video text; speech recognition; sporadic occurrence; term frequency inverse document frequency; text recognition systems; video event detection; video event retrieval; Event detection; Hidden Markov models; Lattices; Optical character recognition software; Speech; Videos; Vocabulary; TF-IDF; lattices; multimedia event detection; speech recognition; video text OCR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854982
  • Filename
    6854982