DocumentCode :
19233
Title :
Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features–A Theoretically Consistent Approach
Author :
Jensen, Jens ; Zheng-Hua Tan
Author_Institution :
Dept. of Electron. Syst., Aalborg Univ., Aalborg, Denmark
Volume :
23
Issue :
1
fYear :
2015
fDate :
Jan. 2015
Firstpage :
186
Lastpage :
197
Abstract :
In this work, we consider the problem of feature enhancement for noise-robust automatic speech recognition (ASR). We propose a method for minimum mean-square error (MMSE) estimation of mel-frequency cepstral features, which is based on a minimum number of well-established, theoretically consistent statistical assumptions. More specifically, the method belongs to the class of methods relying on the statistical framework proposed in Ephraim and Malah´s original work (“Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, 1984). The method is general in that it allows MMSE estimation of mel-frequency cepstral coefficients (MFCC´s), cepstral-mean subtracted (CMS-) MFCC´s, autoregressive-moving-average (ARMA)-filtered CMS-MFCC´s, velocity, and acceleration coefficients. In addition, the method is easily modified to take into account other compressive non-linearities than the logarithm traditionally used for MFCC computation. In terms of MFCC estimation performance, as measured by MFCC mean-square error, the proposed method shows performance which is identical to or better than other state-of-the-art methods. In terms of ASR performance, no statistical difference could be found between the proposed method and the state-of-the-art methods. We conclude that existing state-of-the-art MFCC feature enhancement algorithms within this class of algorithms, while theoretically suboptimal or based on theoretically inconsistent assumptions, perform close to optimally in the MMSE sense.
Keywords :
autoregressive moving average processes; cepstral analysis; least mean squares methods; speech recognition; statistical analysis; ARMA coefficients; ASR; CMS coefficients; MFCC estimation performance; MMSE estimation; acceleration coefficients; autoregressive-moving-average coefficients; cepstral-mean subtracted coefficients; feature enhancement; mel-frequency cepstral features; minimum mean-square error estimation; noise-robust automatic speech recognition; statistical assumptions; velocity coefficients; Estimation; Mean square error methods; Mel frequency cepstral coefficient; Noise; Noise measurement; Speech; Robust automatic speech recognition (ASR); mel-frequency cepstral coefficient (MFCC); minimum mean-square error (MMSE) estimation; speech enhancement;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2014.2377591
Filename :
7010073
Link To Document :
بازگشت