مرکز منطقه ای اطلاع رساني علوم و فناوري - Making themost from multiple microphones in meeting recognition

DocumentCode :

2176989

Title :

Making themost from multiple microphones in meeting recognition

Author :

Stolcke, Andreas

Author_Institution :

Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

4992

Lastpage :

4995

Abstract :

The use of multiple distant microphones has been widely studied for meeting recognition. The two most widely used approaches are 1) combination at the signal level, via blind beamforming, followed by recognition of a single enhanced audio signal, and 2) independent, logically parallel recognition of the multiple audio sources followed by hypothesis-level combination. In this paper we investigate how these two approaches compare for state-of-the-art recognition systems applied to meeting data from the two most recent NIST Rich Transcription evaluations. Our results show that beamforming is the superior approach, giving more accurate results while being inherently less computationally demanding. We then propose a hybrid approach that leverages both beamforming and signal-level diversity for system combination, and show that this approach gives gains over either of the old methods.

Keywords :

microphones; speech recognition; ASR; NIST rich transcription evaluations; automatic speech recognition; hypothesis-level combination; meeting recognition; multiple microphones; single enhanced audio signal; Meeting recognition; blind beamforming; system combination;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947477

Filename :

5947477

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2176989