• DocumentCode
    2176989
  • Title

    Making themost from multiple microphones in meeting recognition

  • Author

    Stolcke, Andreas

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    4992
  • Lastpage
    4995
  • Abstract
    The use of multiple distant microphones has been widely studied for meeting recognition. The two most widely used approaches are 1) combination at the signal level, via blind beamforming, followed by recognition of a single enhanced audio signal, and 2) independent, logically parallel recognition of the multiple audio sources followed by hypothesis-level combination. In this paper we investigate how these two approaches compare for state-of-the-art recognition systems applied to meeting data from the two most recent NIST Rich Transcription evaluations. Our results show that beamforming is the superior approach, giving more accurate results while being inherently less computationally demanding. We then propose a hybrid approach that leverages both beamforming and signal-level diversity for system combination, and show that this approach gives gains over either of the old methods.
  • Keywords
    microphones; speech recognition; ASR; NIST rich transcription evaluations; automatic speech recognition; hypothesis-level combination; meeting recognition; multiple microphones; single enhanced audio signal; Meeting recognition; blind beamforming; system combination;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947477
  • Filename
    5947477