Title :
Statistical voice activity detection using a multiple observation likelihood ratio test
Author :
Ramírez, Javier ; Segura, José C. ; Benítez, Carmen ; García, Luz ; Rubio, Antonio
Author_Institution :
Dept. de Teoria de la Senal, Univ. de Granada, Spain
Abstract :
Currently, there are technology barriers inhibiting speech processing systems that work in extremely noisy conditions from meeting the demands of modern applications. This letter presents a new voice activity detector (VAD) for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm defines an optimum likelihood ratio test (LRT) involving multiple and independent observations. The so-defined decision rule reports significant improvements in speech/nonspeech discrimination accuracy over existing VAD methods that are defined on a single observation and need empirically tuned hangover mechanisms. The algorithm has an inherent delay that, for several applications, including robust speech recognition, does not represent a serious implementation obstacle. An analysis of the overlap between the distributions of the decision variable shows the improved robustness of the proposed approach by means of a clear reduction of the classification error as the number of observations is increased. The proposed strategy is also compared to different VAD methods, including the G.729, AMR, and AFE standards, as well as recently reported algorithms showing a sustained advantage in speech/nonspeech detection accuracy and speech recognition performance.
Keywords :
adaptive signal detection; maximum likelihood detection; signal classification; speech recognition; AFE standard; AMR standard; G.729 standard; MO-LRT; VAD; adaptive multirate; classification error; decision rule; empirically tuned hangover mechanism; multiple observation likelihood ratio test; speech processing system; speech recognition system; speech-nonspeech discrimination; statistical voice activity detection; Acoustical engineering; Detectors; Light rail systems; Noise reduction; Noise robustness; Speech enhancement; Speech processing; Speech recognition; System testing; Working environment noise; Multiple observation likelihood ratio test (MO-LRT); robust speech recognition; voice activity detection;
Journal_Title :
Signal Processing Letters, IEEE
DOI :
10.1109/LSP.2005.855551