• DocumentCode
    2951665
  • Title

    Combining text and audio-visual features in video indexing

  • Author

    Chang, Shih-Fu ; Manmatha, R. ; Chua, Tat-Seng

  • Author_Institution
    Dept. of Electr. Eng., Columbia Univ., New York, NY, USA
  • Volume
    5
  • fYear
    2005
  • fDate
    18-23 March 2005
  • Abstract
    We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance are described, primarily in the broadcast news video domain.
  • Keywords
    database indexing; information retrieval; learning (artificial intelligence); speech recognition; text analysis; video databases; ASR; audio-visual features; automatic speech recognition; broadcast news video; concept detection; imperfect text data; machine learning; multi-modal features; retrieval; story segmentation; text features; topic clustering; video indexing; Automatic speech recognition; Computer science; Data mining; Fuses; Indexing; Information retrieval; Layout; Machine learning; Multimedia communication; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8874-7
  • Type

    conf

  • DOI
    10.1109/ICASSP.2005.1416476
  • Filename
    1416476