Title :
A Method of Joint Compensation of Additive and Convolutive Distortions for Speaker-Independent Speech Recognition
Author_Institution :
DSP Solutions R&D Center, Speech Technol. Lab., Dallas, TX, USA
Abstract :
A speech recognizer operating in a mobile environment has to be robust to two distortion sources: ambient noise (additive distortion) and microphone changes (convolutive distortion). Explicitly and simultaneously modeling the two distortion sources has been a great challenge for speech recognition in adverse environments. In this paper, two log-spectral domain components are introduced in speech acoustic models to represent additive and convolutive distortions. A method, called JAC, jointly compensates both additive and convolutive distortions. For each utterance to be recognized, it adapts HMM mean vectors with a noise estimate and a channel estimate. The noise estimate is calculated from the pre-utterance pause and the channel estimate is calculated using an EM algorithm from speech utterances produced in the distortion environment. The algorithm is evaluated on a noisy speech database recorded in-vehicle with a hands-free distant microphone in several sessions, including parked, stop-and-go, and highway driving conditions. Experiments show that the method typically reduces recognition word error rate by an order of magnitude. The method makes it possible to obtain high performance for speaker-independent recognition in changing noisy environments without collecting any noisy speech for training.
Keywords :
acoustic convolution; acoustic distortion; channel estimation; hidden Markov models; speech recognition; additive distortions; ambient noise; channel estimation; convolutive distortions; hands-free distant microphone; hidden Markov models; joint compensation method; log-spectral domain components; microphone changes; speaker-independent speech recognition; Acoustic distortion; Acoustic noise; Additive noise; Hidden Markov models; Microphones; Noise robustness; Speech analysis; Speech enhancement; Speech recognition; Working environment noise; Additive and convolutive distortion; EM algorithm; channel estimation; noisy speech recognition; robust speech recognition; speech signal modeling;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2005.851963