DocumentCode :
76952
Title :
MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
Author :
Gonzalez, Jose A. ; Peinado, Antonio M. ; Ning Ma ; Gomez, Angel M. ; Barker, J.
Author_Institution :
Dept. of Teor. de la Senal Telematica y Comun., Univ. de Granada, Granada, Spain
Volume :
21
Issue :
3
fYear :
2013
fDate :
Mar-13
Firstpage :
624
Lastpage :
635
Abstract :
This paper addresses the problem of feature compensation in the log-spectral domain by using the missing-data (MD) approach to noise robust speech recognition, that is, the log-spectral features can be either almost unaffected by noise or completely masked by it. First, a general MD framework based on minimum mean square error (MMSE) estimation is introduced which exploits the correlation across frequency bands to reconstruct the missing features. This framework allows the derivation of different MD imputation approaches and, in particular, a novel technique taking advantage of truncated Gaussian distributions is presented. While the proposed technique provides excellent results at high and medium signal-to-noise ratios (SNRs), its performance diminishes at low SNRs where very few reliable features are available. The reconstruction technique is therefore extended to exploit temporal constraints using two different approaches. In the first approach, time-frequency patches of speech containing a number of consecutive frames are modeled using a Gaussian mixture model (GMM). In the second one, the sequential structure of speech is alternatively modeled by a hidden Markov model (HMM). The proposed techniques are evaluated on Aurora-2 and Aurora-4 databases using both oracle and estimated masks. In both cases, the proposed techniques outperform the recognition performance obtained by the baseline system and other related techniques. Also, the introduction of a temporal modeling turns out to be very effective in reconstructing spectra at low SNRs. In particular, HMMs show the highest capability of accounting for time correlations and, therefore, achieve the best results.
Keywords :
Gaussian distribution; hidden Markov models; least mean squares methods; speech recognition; Aurora-2 databases; Aurora-4 databases; GMM; Gaussian mixture model; HMM; MD approach; MD imputation approaches; MMSE-based missing-feature reconstruction; feature compensation; hidden Markov model; log-spectral domain; log-spectral features; minimum mean square error estimation; missing-data approach; robust speech recognition; signal-to-noise ratios; speech containing; speech sequential structure; temporal constraints; temporal modeling; time-frequency patches; truncated Gaussian distributions; Correlation; Covariance matrix; Estimation; Hidden Markov models; Noise; Reliability; Speech; Minimum mean square error estimation; missing-feature; robust speech recognition; spectral reconstruction;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2229982
Filename :
6362180
Link To Document :
بازگشت