DocumentCode
19233
Title
Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features–A Theoretically Consistent Approach
Author
Jensen, Jens ; Zheng-Hua Tan
Author_Institution
Dept. of Electron. Syst., Aalborg Univ., Aalborg, Denmark
Volume
23
Issue
1
fYear
2015
fDate
Jan. 2015
Firstpage
186
Lastpage
197
Abstract
In this work, we consider the problem of feature enhancement for noise-robust automatic speech recognition (ASR). We propose a method for minimum mean-square error (MMSE) estimation of mel-frequency cepstral features, which is based on a minimum number of well-established, theoretically consistent statistical assumptions. More specifically, the method belongs to the class of methods relying on the statistical framework proposed in Ephraim and Malah´s original work (“Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, 1984). The method is general in that it allows MMSE estimation of mel-frequency cepstral coefficients (MFCC´s), cepstral-mean subtracted (CMS-) MFCC´s, autoregressive-moving-average (ARMA)-filtered CMS-MFCC´s, velocity, and acceleration coefficients. In addition, the method is easily modified to take into account other compressive non-linearities than the logarithm traditionally used for MFCC computation. In terms of MFCC estimation performance, as measured by MFCC mean-square error, the proposed method shows performance which is identical to or better than other state-of-the-art methods. In terms of ASR performance, no statistical difference could be found between the proposed method and the state-of-the-art methods. We conclude that existing state-of-the-art MFCC feature enhancement algorithms within this class of algorithms, while theoretically suboptimal or based on theoretically inconsistent assumptions, perform close to optimally in the MMSE sense.
Keywords
autoregressive moving average processes; cepstral analysis; least mean squares methods; speech recognition; statistical analysis; ARMA coefficients; ASR; CMS coefficients; MFCC estimation performance; MMSE estimation; acceleration coefficients; autoregressive-moving-average coefficients; cepstral-mean subtracted coefficients; feature enhancement; mel-frequency cepstral features; minimum mean-square error estimation; noise-robust automatic speech recognition; statistical assumptions; velocity coefficients; Estimation; Mean square error methods; Mel frequency cepstral coefficient; Noise; Noise measurement; Speech; Robust automatic speech recognition (ASR); mel-frequency cepstral coefficient (MFCC); minimum mean-square error (MMSE) estimation; speech enhancement;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2014.2377591
Filename
7010073
Link To Document