Title :
Coupling binary masking and robust ASR
Author :
Narayanan, Arun ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
We present a novel framework for performing speech separation and robust automatic speech recognition (ASR) in a unified fashion. Separation is performed by estimating the ideal binary mask (IBM), which identifies speech dominant and noise dominant units in a time-frequency (T-F) representation of the noisy signal. ASR is performed on extracted cepstral features after binary masking. Previous systems perform these steps in a sequential fashion - separation followed by recognition. The proposed framework, which we call bidirectional speech decoding (BSD), unifies these two stages. It does this by using multiple IBM estimators each of which is designed specifically for a back-end acoustic phonetic unit (BPU) of the recognizer. The standard ASR decoder is modified to use these IBM estimators to obtain BPU-specific cepstra during likelihood calculation. On the Aurora-4 robust ASR task, the proposed framework obtains a relative improvement of 17% in word error rate over the noisy baseline. It also obtains significant improvements in the quality of the estimated IBM.
Keywords :
estimation theory; speech coding; speech intelligibility; speech recognition; Aurora-4 robust ASR task; BPU-specific cepstra; BSD; automatic speech recognition; back-end acoustic phonetic unit; bidirectional speech decoding; binary masking; cepstral feature; ideal binary mask; multiple IBM estimator; noise dominant unit; speech dominant unit; speech separation; standard ASR decoder; time-frequency representation; word error rate; Decoding; Estimation; Feature extraction; Noise; Noise measurement; Speech; Speech recognition; Aurora-4; Computational Auditory Scene Analysis; bidirectional speech decoder; noise robust ASR;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6638982