DocumentCode
1689271
Title
A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data
Author
Kinnunen, Tomi ; Rajan, Parvathy
Author_Institution
Sch. of Comput., Univ. of Eastern Finland (UEF), Joensuu, Finland
fYear
2013
Firstpage
7229
Lastpage
7233
Abstract
A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-by-utterance basis from mel-frequency cepstral coefficients (MFCCs). The training labels are obtained from enhanced energy VAD. As the speech and nonspeech models are re-trained for each utterance, minimum assumptions of the background noise are made. According to both VAD error analysis and speaker verification results utilizing state-of-the-art i-vector system, the proposed method outperforms energy VAD variants by a wide margin. We provide open-source implementation of the method.
Keywords
speaker recognition; voice communication; MFCC; VAD error analysis; enhanced energy VAD; likelihood ratio based VAD; mel-frequency cepstral coefficients; microphone data; noise free conditions; noisy conditions; noisy telephone; nonspeech models; robust speaker verification; self-adaptive voice activity detector; speech enhancement preprocessing; utterance by utterance basis; NIST; Noise measurement; Signal to noise ratio; Speaker recognition; Speech; Training; Voice activity detection; speaker verification;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639066
Filename
6639066
Link To Document