• DocumentCode
    164838
  • Title

    A speech event detection and localization task for multiroom environments

  • Author

    Brutti, Alessio ; Ravanelli, Mirco ; Svaizer, Piergiorgio ; Omologo, Maurizio

  • Author_Institution
    Center for Inf. & Commun. Technol, Fondazione Bruno Kessler, Trento, Italy
  • fYear
    2014
  • fDate
    12-14 May 2014
  • Firstpage
    157
  • Lastpage
    161
  • Abstract
    Domestic environments are particularly challenging for distant speech recognition and audio processing in general. Reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically degrade the performance of standard speech processing algorithms. The DIRHA EU project addresses the development of distant-speech interaction with devices and services within the multiple rooms of typical apartments. A corpus of multichannel acoustic data has been created to represent realistic acoustic scenes, of different degrees of complexity, occurring in such an environment. It includes multichannel simulations based on measured impulse responses and real data collected in the same apartment. A basic but fundamental task of the front-end processing enabling effective ASR is the detection and localization of speech events generated by users, without constraints on their position or orientation within the various rooms. In this paper we describe the acoustic corpus and present a baseline approach to the joint task of speech detection and source localization, using speech related features such as pitch, combined with features derived from spatial coherence.
  • Keywords
    acoustic signal processing; audio signal processing; reverberation; speech recognition; ASR; DIRHA EU project; acoustic corpus; acoustic event propagation; audio processing; automatic speech recognition; background noise; distant speech recognition; distant-speech interaction; distant-speech interaction for robust home application; domestic environments; impulse responses; interfering sources; multichannel acoustic data; multiroom environments; realistic acoustic scenes; reverberation; source localization; spatial coherence; speech event detection; speech event localization; standard speech processing algorithm; Acoustics; Joints; Microphone arrays; Noise; Speech; Speech recognition; Speech activity detection; acoustic corpora; distributed microphone networks; source localization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on
  • Conference_Location
    Villers-les-Nancy
  • Type

    conf

  • DOI
    10.1109/HSCMA.2014.6843271
  • Filename
    6843271