• DocumentCode
    3286206
  • Title

    A generic classification system for multi-channel audio indexing: Application to speech and music detection

  • Author

    Benaroya, Elie-Laurent ; Peeters, G.

  • Author_Institution
    STMS IRCAM, Sound Anal./Synthesis Team, UPMC, Paris, France
  • fYear
    2013
  • fDate
    3-5 July 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    There is a rise in the number 3D audio-visual productions and archives that creates a need for indexation of 3D contents. Event detection using audio modality is a difficult task. The standard way to do classification on 3D audio is to first down-mix to mono audio and classify on that. In this paper, we describe a generic classifier for multi-channel audio event detection and propose several information fusion strategies. Our system is evaluated on a speech and music detection task on the audio of 3D movies. We improve the classification performances on our database by 1.5% for speech detection, and 8% for music detection, compared to the standard downmixing method. We also provide a comparison of several information fusion methods in the experiments.
  • Keywords
    audio signal processing; indexing; sensor fusion; speech processing; three-dimensional television; 3D audio classification; 3D audio-visual productions; 3D contents; 3D movies; audio modality; generic classification system; information fusion; mono audio; multichannel audio event detection; multichannel audio indexing; music detection; speech detection; standard downmixing method; Feature extraction; Frequency measurement; MONOS devices; Motion pictures; Speech; Support vector machines; Three-dimensional displays;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Analysis for Multimedia Interactive Services (WIAMIS), 2013 14th International Workshop on
  • Conference_Location
    Paris
  • ISSN
    2158-5873
  • Type

    conf

  • DOI
    10.1109/WIAMIS.2013.6616160
  • Filename
    6616160