Title :
Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition
Author :
González, Jose A. ; Peinado, Antonio M. ; Gómez, Angel M. ; Carmona, José L.
Author_Institution :
Dept. of Teor. de la Senal, Telematica y Comun., Univ. de Granada, Granada, Spain
fDate :
7/1/2011 12:00:00 AM
Abstract :
This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.
Keywords :
Gaussian processes; compensation; decoding; hidden Markov models; least mean squares methods; signal classification; speech coding; speech recognition; vector quantisation; GMM; Gaussian mixture models; MMSE estimation method; acoustic models; exponential weighting; feature compensation framework; hidden Markov modeling; minimum mean square error estimation; multienvironment robust speech recognition; multiple-model approach; soft-data decoding; stereo training data; subregion-based modeling; vector quantization codebook; Acoustics; Adaptation model; Computational modeling; Estimation; Hidden Markov models; Noise; Noise measurement; Feature vector compensation; minimum mean square error (MMSE) estimation; robust speech recognition; stereo-data;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2010.2087753