مرکز منطقه ای اطلاع رساني علوم و فناوري - Multispeaker speech activity detection for the ICSI meeting recorder

DocumentCode :

2279497

Title :

Multispeaker speech activity detection for the ICSI meeting recorder

Author :

Pfau, Thilo ; Ellis, Daniel P W ; Stolcke, Andreas

Author_Institution :

Int. Comput. Sci. Inst., Berkeley, CA, USA

fYear :

2001

fDate :

2001

Firstpage :

107

Lastpage :

110

Abstract :

As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.

Keywords :

Gaussian distribution; correlation methods; echo suppression; error statistics; feature extraction; hidden Markov models; speaker recognition; Gaussian mixtures; HMM; ICSI meeting recorder; channel independence; crosscorrelation processing; crosstalk detection; energy normalization; feature normalization; frame error rate reduction; hidden Markov model; multichannel speech activity detection; multispeaker speech activity detection; recognizer input presegmentation; robustness; speaker identification; speech recognition; word error rates; Computer science; Crosstalk; Detectors; Hidden Markov models; Labeling; Microphones; Microwave integrated circuits; Noise level; Silicon compounds; Speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on

Print_ISBN :

0-7803-7343-X

Type :

conf

DOI :

10.1109/ASRU.2001.1034599

Filename :

1034599

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2279497