DocumentCode :
1097171
Title :
Transforming Binary Uncertainties for Robust Speech Recognition
Author :
Srinivasan, Soundararajan ; Wang, DeLiang
Author_Institution :
Ohio State Univ., Columbus
Volume :
15
Issue :
7
fYear :
2007
Firstpage :
2130
Lastpage :
2140
Abstract :
Recently, several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain. This uncertainty is used by a decoder that exploits the variance associated with the enhanced cepstral features to improve robust speech recognition. Systematic evaluations on a subset of the Aurora4 task using the estimated uncertainty show substantial improvement over the baseline performance across various noise conditions.
Keywords :
regression analysis; speech intelligibility; speech recognition; trees (mathematics); binary mask; linear spectral domain; noisy speech signal; regression trees; robust speech recognition; time-frequency regions; Acoustic noise; Automatic speech recognition; Biomedical engineering; Cepstral analysis; Decoding; Noise robustness; Speech coding; Speech enhancement; Speech recognition; Uncertainty; Binary time–frequency mask; computational auditory scene analysis (CASA); robust automatic speech recognition; spectrogram reconstruction; uncertainty decoding;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.901836
Filename :
4291614
Link To Document :
بازگشت