Title :
Joint noise adaptive training for robust automatic speech recognition
Author :
Narayanan, Arun ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
We explore time-frequency masking to improve noise robust automatic speech recognition. Apart from its use as a frontend, we use it for providing smooth estimates of speech and noise which are then passed as additional features to a deep neural network (DNN) based acoustic model. Such a system improves performance on the Aurora-4 dataset by 10.5% (relative) compared to the previous best published results. By formulating separation as a supervised mask estimation problem, we develop a unified DNN framework that jointly improves separation and acoustic modeling. Our final system outperforms the previous best system on CHiME-2 corpus by 22.1% (relative).
Keywords :
neural nets; speech recognition; time-frequency analysis; Aurora-4 dataset; CHiME-2 corpus; DNN based acoustic model; deep neural network; joint noise adaptive training; noise robust automatic speech recognition; noise smooth estimates; speech smooth estimates; supervised mask estimation problem; time-frequency masking; unified DNN framework; Acoustics; Joints; Noise; Noise measurement; Speech; Speech recognition; Training; Aurora-4; CHiME-2; Deep neural network; noise robustness; time-frequency masking;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854051