مرکز منطقه ای اطلاع رساني علوم و فناوري - Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features

DocumentCode :

19233

Title :

Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features–A Theoretically Consistent Approach

Author :

Jensen, Jens ; Zheng-Hua Tan

Author_Institution :

Dept. of Electron. Syst., Aalborg Univ., Aalborg, Denmark

Volume :

Issue :

fYear :

2015

fDate :

Jan. 2015

Firstpage :

186

Lastpage :

197

Abstract :

In this work, we consider the problem of feature enhancement for noise-robust automatic speech recognition (ASR). We propose a method for minimum mean-square error (MMSE) estimation of mel-frequency cepstral features, which is based on a minimum number of well-established, theoretically consistent statistical assumptions. More specifically, the method belongs to the class of methods relying on the statistical framework proposed in Ephraim and Malah´s original work (“Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, 1984). The method is general in that it allows MMSE estimation of mel-frequency cepstral coefficients (MFCC´s), cepstral-mean subtracted (CMS-) MFCC´s, autoregressive-moving-average (ARMA)-filtered CMS-MFCC´s, velocity, and acceleration coefficients. In addition, the method is easily modified to take into account other compressive non-linearities than the logarithm traditionally used for MFCC computation. In terms of MFCC estimation performance, as measured by MFCC mean-square error, the proposed method shows performance which is identical to or better than other state-of-the-art methods. In terms of ASR performance, no statistical difference could be found between the proposed method and the state-of-the-art methods. We conclude that existing state-of-the-art MFCC feature enhancement algorithms within this class of algorithms, while theoretically suboptimal or based on theoretically inconsistent assumptions, perform close to optimally in the MMSE sense.

Keywords :

autoregressive moving average processes; cepstral analysis; least mean squares methods; speech recognition; statistical analysis; ARMA coefficients; ASR; CMS coefficients; MFCC estimation performance; MMSE estimation; acceleration coefficients; autoregressive-moving-average coefficients; cepstral-mean subtracted coefficients; feature enhancement; mel-frequency cepstral features; minimum mean-square error estimation; noise-robust automatic speech recognition; statistical assumptions; velocity coefficients; Estimation; Mean square error methods; Mel frequency cepstral coefficient; Noise; Noise measurement; Speech; Robust automatic speech recognition (ASR); mel-frequency cepstral coefficient (MFCC); minimum mean-square error (MMSE) estimation; speech enhancement;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2014.2377591

Filename :

7010073

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=19233