DocumentCode :
1511004
Title :
Missing-Feature Reconstruction by Leveraging Temporal Spectral Correlation for Robust Speech Recognition in Background Noise Conditions
Author :
Kim, Wooil ; Hansen, John H L
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
Volume :
18
Issue :
8
fYear :
2010
Firstpage :
2111
Lastpage :
2120
Abstract :
This paper proposes a novel missing-feature reconstruction method to improve speech recognition in background noise environments. The existing missing-feature reconstruction method utilizes log-spectral correlation across frequency bands. In this paper, we propose to employ a temporal spectral feature analysis to improve the missing-feature reconstruction performance by leveraging temporal correlation across neighboring frames. In a similar manner with the conventional method, a Gaussian mixture model is obtained by training over the obtained temporal spectral feature set. The final estimates for missing-feature reconstruction are obtained by a selective combination of the original frequency correlation based method and the proposed temporal correlation-based method. Performance of the proposed method is evaluated on the TIMIT speech corpus using various types of background noise conditions and the CU-Move in-vehicle speech corpus. Experimental results demonstrate that the proposed method is more effective at increasing speech recognition performance in adverse conditions. By employing the proposed temporal-frequency based reconstruction method, a +17.71% average relative improvement in word error rate (WER) is obtained for white, car, speech babble, and background music conditions over 5-, 10-, and 15-dB SNR, compared to the original frequency correlation-based method. We also obtain a +16.72% relative improvement in real-life in-vehicle conditions using data from the CU-Move corpus.
Keywords :
Gaussian processes; correlation methods; noise; speech recognition; CU-Move in-vehicle speech corpus; Gaussian mixture model; TIMIT speech corpus; background noise conditions; missing feature reconstruction; robust speech recognition; temporal spectral correlation; temporal spectral feature analysis; Background noise; Error analysis; Frequency estimation; Noise robustness; Performance analysis; Reconstruction algorithms; Spectral analysis; Speech analysis; Speech enhancement; Speech recognition; Background noise; missing-feature; robust speech recognition; temporal correlation; temporal spectral feature;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2010.2041698
Filename :
5482073
Link To Document :
بازگشت