• DocumentCode
    19233
  • Title

    Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features–A Theoretically Consistent Approach

  • Author

    Jensen, Jens ; Zheng-Hua Tan

  • Author_Institution
    Dept. of Electron. Syst., Aalborg Univ., Aalborg, Denmark
  • Volume
    23
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan. 2015
  • Firstpage
    186
  • Lastpage
    197
  • Abstract
    In this work, we consider the problem of feature enhancement for noise-robust automatic speech recognition (ASR). We propose a method for minimum mean-square error (MMSE) estimation of mel-frequency cepstral features, which is based on a minimum number of well-established, theoretically consistent statistical assumptions. More specifically, the method belongs to the class of methods relying on the statistical framework proposed in Ephraim and Malah´s original work (“Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, 1984). The method is general in that it allows MMSE estimation of mel-frequency cepstral coefficients (MFCC´s), cepstral-mean subtracted (CMS-) MFCC´s, autoregressive-moving-average (ARMA)-filtered CMS-MFCC´s, velocity, and acceleration coefficients. In addition, the method is easily modified to take into account other compressive non-linearities than the logarithm traditionally used for MFCC computation. In terms of MFCC estimation performance, as measured by MFCC mean-square error, the proposed method shows performance which is identical to or better than other state-of-the-art methods. In terms of ASR performance, no statistical difference could be found between the proposed method and the state-of-the-art methods. We conclude that existing state-of-the-art MFCC feature enhancement algorithms within this class of algorithms, while theoretically suboptimal or based on theoretically inconsistent assumptions, perform close to optimally in the MMSE sense.
  • Keywords
    autoregressive moving average processes; cepstral analysis; least mean squares methods; speech recognition; statistical analysis; ARMA coefficients; ASR; CMS coefficients; MFCC estimation performance; MMSE estimation; acceleration coefficients; autoregressive-moving-average coefficients; cepstral-mean subtracted coefficients; feature enhancement; mel-frequency cepstral features; minimum mean-square error estimation; noise-robust automatic speech recognition; statistical assumptions; velocity coefficients; Estimation; Mean square error methods; Mel frequency cepstral coefficient; Noise; Noise measurement; Speech; Robust automatic speech recognition (ASR); mel-frequency cepstral coefficient (MFCC); minimum mean-square error (MMSE) estimation; speech enhancement;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2377591
  • Filename
    7010073