DocumentCode :
950799
Title :
Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
Author :
Deng, Li ; Droppo, Jasha ; Acero, Alex
Author_Institution :
Microsoft Res., Redmond, WA, USA
Volume :
12
Issue :
2
fYear :
2004
fDate :
3/1/2004 12:00:00 AM
Firstpage :
133
Lastpage :
143
Abstract :
This paper presents a novel speech feature enhancement technique based on a probabilistic, nonlinear acoustic environment model that effectively incorporates the phase relationship (hence phase sensitive) between the clean speech and the corrupting noise in the acoustic distortion process. The core of the enhancement algorithm is the MMSE (minimum mean square error) estimator for the log Mel power spectra of clean speech based on the phase-sensitive environment model, using highly efficient single-point, second-order Taylor series expansion to approximate the joint probability of clean and noisy speech modeled as a multivariate Gaussian. Since a noise estimate is required by the MMSE estimator, a high-quality, sequential noise estimation algorithm is also developed and presented. Both the noise estimation and speech feature enhancement algorithms are evaluated on the Aurora2 task of connected digit recognition. Noise-robust speech recognition results demonstrate that the new acoustic environment model which takes into account the relative phase in speech and noise mixing is superior to the earlier environment model which discards the phase under otherwise identical experimental conditions. The results also show that the sequential MAP (maximum a posteriori) learning for noise estimation is better than the sequential ML (maximum likelihood) learning, both evaluated under the identical phase-sensitive MMSE enhancement condition.
Keywords :
Gaussian processes; acoustic distortion; least mean squares methods; maximum likelihood sequence estimation; speech enhancement; speech recognition; Aurora2; MMSE estimator; Taylor series expansion; acoustic distortion process; acoustic environment; clean speech; connected digit recognition; corrupting noise; log Mel power spectra; maximum a posteriori; minimum mean square error estimator; multivariate Gaussian; noise robust speech recognition; noisy speech; phase sensitive model; sequential MAP learning; sequential noise estimation; speech feature enhancement; Acoustic distortion; Acoustic noise; Maximum likelihood estimation; Nonlinear acoustics; Phase estimation; Phase noise; Speech enhancement; Speech processing; Speech recognition; Working environment noise;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/TSA.2003.820201
Filename :
1284341
Link To Document :
بازگشت