• DocumentCode
    1688392
  • Title

    Ideal ratio mask estimation using deep neural networks for robust speech recognition

  • Author

    Narayanan, Arun ; DeLiang Wang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2013
  • Firstpage
    7092
  • Lastpage
    7096
  • Abstract
    We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the multi-condition training data. In terms of instantaneous SNR estimation performance, the proposed system obtains a mean absolute error of less than 4 dB in most frequency channels.
  • Keywords
    feature extraction; frequency-domain analysis; neural nets; speech enhancement; speech recognition; ASR model; Aurora-4 robust ASR corpus; IRM estimation; Mel frequency domain; cepstral feature extraction; deep neural networks; feature enhancement algorithm; ideal binary mask estimation; ideal ratio mask estimation; instantaneous SNR estimation performance; mean absolute error; multicondition training data; noisy Mel spectrogram; robust ASR; robust automatic speech recognition; robust speech recognition; time-frequency unit level features; word error rates; Estimation; Feature extraction; Robustness; Signal to noise ratio; Speech; Speech recognition; Aurora-4; Computational Auditory Scene Analysis; instantaneous SNR; noise robust ASR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639038
  • Filename
    6639038