Ideal ratio mask estimation using deep neural networks for robust speech recognition

Author

Narayanan, Arun ; DeLiang Wang

Author_Institution

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear

2013

Firstpage

7092

Lastpage

7096

Abstract

We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the multi-condition training data. In terms of instantaneous SNR estimation performance, the proposed system obtains a mean absolute error of less than 4 dB in most frequency channels.

Keywords

feature extraction; frequency-domain analysis; neural nets; speech enhancement; speech recognition; ASR model; Aurora-4 robust ASR corpus; IRM estimation; Mel frequency domain; cepstral feature extraction; deep neural networks; feature enhancement algorithm; ideal binary mask estimation; ideal ratio mask estimation; instantaneous SNR estimation performance; mean absolute error; multicondition training data; noisy Mel spectrogram; robust ASR; robust automatic speech recognition; robust speech recognition; time-frequency unit level features; word error rates; Estimation; Feature extraction; Robustness; Signal to noise ratio; Speech; Speech recognition; Aurora-4; Computational Auditory Scene Analysis; instantaneous SNR; noise robust ASR;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639038

Filename

6639038