Title :
A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data
Author :
Kinnunen, Tomi ; Rajan, Parvathy
Author_Institution :
Sch. of Comput., Univ. of Eastern Finland (UEF), Joensuu, Finland
Abstract :
A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-by-utterance basis from mel-frequency cepstral coefficients (MFCCs). The training labels are obtained from enhanced energy VAD. As the speech and nonspeech models are re-trained for each utterance, minimum assumptions of the background noise are made. According to both VAD error analysis and speaker verification results utilizing state-of-the-art i-vector system, the proposed method outperforms energy VAD variants by a wide margin. We provide open-source implementation of the method.
Keywords :
speaker recognition; voice communication; MFCC; VAD error analysis; enhanced energy VAD; likelihood ratio based VAD; mel-frequency cepstral coefficients; microphone data; noise free conditions; noisy conditions; noisy telephone; nonspeech models; robust speaker verification; self-adaptive voice activity detector; speech enhancement preprocessing; utterance by utterance basis; NIST; Noise measurement; Signal to noise ratio; Speaker recognition; Speech; Training; Voice activity detection; speaker verification;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639066