مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition

DocumentCode :

1224439

Title :

Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition

Author :

Ramírez, Javier ; Segura, José C. ; Górriz, Juan M. ; García, Luz

Author_Institution :

Dept. of Signal Theor., Univ. of Granada, Granada

Volume :

Issue :

fYear :

2007

Firstpage :

2177

Lastpage :

2189

Abstract :

This paper shows an improved statistical test for voice activity detection in noise adverse environments. The method is based on a revised contextual likelihood ratio test (LRT) defined over a multiple observation window. The motivations for revising the original multiple observation LRT (MO-LRT) are found in its artificially added hangover mechanism that exhibits an incorrect behavior under different signal-to-noise ratio (SNR) conditions. The new approach defines a maximum a posteriori (MAP) statistical test in which all the global hypotheses on the multiple observation window containing up to one speech-to-nonspeech or nonspeech-to-speech transitions are considered. Thus, the implicit hangover mechanism artificially added by the original method was not found in the revised method so its design can be further improved. With these and other innovations, the proposed method showed a higher speech/nonspeech discrimination accuracy over a wide range of SNR conditions when compared to the original MO-LRT voice activity detector (VAD). Experiments conducted on the AURORA databases and tasks showed that the revised method yields significant improvements in speech recognition performance over standardized VADs such as ITU T G.729 and ETSI AMR for discontinuous voice transmission and the ETSI AFE for distributed speech recognition (DSR), as well as over recently reported methods.

Keywords :

maximum likelihood estimation; speech recognition; statistical testing; voice communication; AURORA database; contextual multiple hypothesis testing; discontinuous voice transmission; distributed speech recognition; likelihood ratio test; maximum aposteriori statistical test; nonspeech-to-speech transition; robust speech recognition; speech-to-nonspeech transition; voice activity detection; Design methodology; Detectors; Light rail systems; Noise robustness; Signal to noise ratio; Speech recognition; Technological innovation; Telecommunication standards; Testing; Working environment noise; Multiple hypothesis testing; robust speech recognition; voice activity detection (VAD);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2007.903937

Filename :

4317575

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1224439