Title :
Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
Author :
Narayanan, Arun ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Recently, supervised classification has been shown to work well for the task of speech separation. We perform an in-depth evaluation of such techniques as a front-end for noise-robust automatic speech recognition (ASR). The proposed separation front-end consists of two stages. The first stage removes additive noise via time-frequency masking. The second stage addresses channel mismatch and the distortions introduced by the first stage; a non-linear function is learned that maps the masked spectral features to their clean counterpart. Results show that the proposed front-end substantially improves ASR performance when the acoustic models are trained in clean conditions. We also propose a diagonal feature discriminant linear regression (dFDLR) adaptation that can be performed on a per-utterance basis for ASR systems employing deep neural networks and HMM. Results show that dFDLR consistently improves performance in all test conditions. Surprisingly, the best average results are obtained when dFDLR is applied to models trained using noisy log-Mel spectral features from the multi-condition training set. With no channel mismatch, the best results are obtained when the proposed speech separation front-end is used along with multi-condition training using log-Mel features followed by dFDLR adaptation. Both these results are among the best on the Aurora-4 dataset.
Keywords :
distortion; hidden Markov models; neural nets; nonlinear functions; regression analysis; signal classification; signal denoising; speech recognition; ASR systems; Aurora-4 dataset; HMM; acoustic models; additive noise removal; channel mismatch; dFDLR adaptation; deep neural networks; diagonal feature discriminant linear regression adaptation; distortions; masked spectral features; multicondition training set; noise robust speech recognition; noisy log-Mel spectral features; nonlinear function; speech separation front-end; supervised classification; time-frequency masking; Acoustics; Adaptation models; Feature extraction; Noise; Speech; Speech processing; Training; Aurora-4; deep neural networks; feature mapping; robust ASR; time-frequency masking;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2305833