DocumentCode :
35973
Title :
Robust Log-Energy Estimation and its Dynamic Change Enhancement for In-car Speech Recognition
Author :
Weifeng Li ; Longbiao Wang ; Yicong Zhou ; Bourlard, Herve ; Qingmin Liao
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Shenzhen, China
Volume :
21
Issue :
8
fYear :
2013
fDate :
Aug. 2013
Firstpage :
1689
Lastpage :
1698
Abstract :
The log-energy parameter, typically derived from a full-band spectrum, is a critical feature commonly used in automatic speech recognition (ASR) systems. However, log-energy is difficult to estimate reliably in the presence of background noise. In this paper, we theoretically show that background noise affects the trajectories of not only the “conventional” log-energy, but also its delta parameters. This results in a poor estimation of the actual log-energy and its delta parameters, which no longer describe the speech signal. We thus propose a new method to estimate log-energy from a sub-band spectrum, followed by dynamic change enhancement and mean smoothing. We demonstrate the effectiveness of the proposed log-energy estimation and its post-processing steps through speech recognition experiments conducted on the in-car CENSREC-2 database. The proposed log-energy (together with its corresponding delta parameters) yields an average improvement of 32.8% compared with the baseline front-ends. Moreover, it is also shown that further improvement can be achieved by incorporating the new Mel-Frequency Cepstral Coefficients (MFCCs) obtained by non-linear spectral contrast stretching.
Keywords :
noise; speech enhancement; speech recognition; Mel-frequency cepstral coefficients; automatic speech recognition systems; background noise; delta parameters; dynamic change enhancement; full-band spectrum; in-car CENSREC-2 database; in-car speech recognition; log-energy estimation; mean smoothing; nonlinear spectral contrast stretching; robust log-energy estimation; speech signal; sub-band spectrum; Dynamic change enhancement; in-car speech recognition; log-energy; mel-filterbank (MFB); mel-frequency cepstral coefficients (MFCCs);
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2260151
Filename :
6508817
Link To Document :
بازگشت