• DocumentCode
    1668704
  • Title

    Graph based multimodal word clustering for video event detection

  • Author

    Vembu, Aravind ; Natarajan, Prem ; Shuang Wu ; Prasad, Ranga ; Natarajan, Prem

  • Author_Institution
    Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2013
  • Firstpage
    3667
  • Lastpage
    3671
  • Abstract
    Combining diverse low-level features from multiple modalities has consistently improved performance over a range of video processing tasks, including event detection. In our work, we study graph based clustering techniques for integrating information from multiple modalities by identifying word clusters spread across the different modalities. We present different methods to identify word clusters including word similarity graph partitioning, word-video co-clustering and Latent Semantic Indexing and the impact of different metrics to quantify the co-occurrence of words. We present experimental results on a ≈45000 video dataset used in the TRECVID MED 11 evaluations. Our experiments show that multimodal features have consistent performance gains over the use of individual features. Further, word similarity graph construction using a complete graph representation consistently improves over partite graphs and early fusion based multimodal systems. Finally, we see additional performance gains by fusing multimodal features with individual features.
  • Keywords
    feature extraction; indexing; natural language interfaces; natural language processing; video signal processing; TRECVID MED 11 evaluations; complete graph representation; fusion based multimodal systems; graph based clustering techniques; graph based multimodal word clustering; information integration; latent semantic indexing; low-level features; multimodal features; multiple modalities; partite graphs; video dataset; video event detection; video processing tasks; word cluster identification; word similarity graph construction; word similarity graph partitioning; word-video coclustering; Event detection; Feature extraction; Kernel; Measurement; Mel frequency cepstral coefficient; Semantics; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638342
  • Filename
    6638342