• DocumentCode
    3422217
  • Title

    A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition

  • Author

    Yu, Dong ; Deng, Li ; Droppo, Jasha ; Wu, Jian ; Gong, Yifan ; Acero, Alex

  • Author_Institution
    Microsoft Corp., Redmond, WA
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4041
  • Lastpage
    4044
  • Abstract
    We present a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Distinguishing from the MMSE enhancement in log spectral amplitude proposed by Ephraim and Malah (E&M) (1985), the new algorithm presented in this paper develops the suppression rule that applies to power spectral magnitude of the filter-banks´ outputs and to MFCC directly, making it demonstrably more effective in noise-robust speech recognition. The noise variance in the new algorithm contains a significant term resulting from instantaneous phase asynchrony between clean speech and mixing noise, missing in the E&M algorithm. Speech recognition experiments on the standard Aurora-3 task demonstrate a reduction of word error rate by 48% against the ICSLP02 baseline, by 26% against the cepstral mean normalization baseline, and by 13% against the conventional E&M log-MMSE noise suppressor. The new algorithm is also much more efficient than E&M noise suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins in the FFT domain (256). The results also show that our algorithm performs slightly better than the ETSI AFE on the well-matched and mid-mismatched settings.
  • Keywords
    channel bank filters; fast Fourier transforms; least mean squares methods; speech processing; speech recognition; FFT; MMSE; Mel-frequency cepstra; Mel-frequency filter bank; log spectral amplitude; minimum-mean-square-error noise reduction algorithm; mixing noise; nonlinear feature-domain noise reduction; power spectral magnitude; robust speech recognition; Cepstral analysis; Channel bank filters; Error analysis; Mean square error methods; Mel frequency cepstral coefficient; Noise reduction; Noise robustness; Phase noise; Speech enhancement; Speech recognition; MFCC; MMSE Estimator; Noise Reduction; Robust ASR; Speech Feature Enhancement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518541
  • Filename
    4518541