• DocumentCode
    1415956
  • Title

    Multimedia content analysis-using both audio and visual clues

  • Author

    Wang, Yao ; Liu, Zhu ; Huang, Jin-Cheng

  • Author_Institution
    Polytech. Univ. of Brooklyn, New York, NY, USA
  • Volume
    17
  • Issue
    6
  • fYear
    2000
  • fDate
    11/1/2000 12:00:00 AM
  • Firstpage
    12
  • Lastpage
    36
  • Abstract
    Multimedia content analysis refers to the computerized understanding of the semantic meanings of a multimedia document, such as a video sequence with an accompanying audio track. With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary of each other, Therefore, it is necessary to analyze all types of data: image frames, sound tracks, texts that can be extracted from image frames, and spoken words that can be deciphered from the audio track. This usually involves segmenting the document into semantically meaningful units, classifying each unit into a predefined scene type, and indexing and summarizing the document for efficient retrieval and browsing. We review advances in using audio and visual information jointly for accomplishing the above tasks. We describe audio and visual features that can effectively characterize scene content, present selected algorithms for segmentation and classification, and review some testbed systems for video archiving and retrieval. We also describe audio and visual descriptors and description schemes that are being considered by the MPEG-7 standard for multimedia content description
  • Keywords
    audio signal processing; content-based retrieval; image classification; image retrieval; image segmentation; image sequences; multimedia systems; telecommunication standards; video signal processing; MPEG-7 standard; algorithms; audio clues; audio descriptors; audio track; browsing; classification; computerized understanding; document retrieval; image frames; multimedia content analysis; multimedia content description; multimedia document; scene content; segmentation; semantic meanings; sound tracks; spoken words; testbed systems; texts; video archiving; video retrieval; video sequence; visual clues; visual descriptors; Data mining; Earthquakes; Image analysis; Image segmentation; Indexing; Information analysis; Information retrieval; Layout; Video sequences; Video sharing;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    1053-5888
  • Type

    jour

  • DOI
    10.1109/79.888862
  • Filename
    888862