Title :
Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition
Author :
Deblin Bagchi;Michael I. Mandel;Zhongqiu Wang;Yanzhang He;Andrew Plummer;Eric Fosler-Lussier
Author_Institution :
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
Abstract :
Automatic Speech Recognition systems suffer from severe performance degradation in the presence of myriad complicating factors such as noise, reverberation, multiple speech sources, multiple recording devices, etc. Previous challenges have sparked much innovation when it comes to designing systems capable of handling these complications. In this spirit, the CHiME-3 challenge presents system builders with the task of recognizing speech in a real-world noisy setting wherein speakers talk to an array of 6 microphones in a tablet. In order to address these issues, we explore the effectiveness of first applying a model-based source separation mask to the output of a beamformer that combines the source signals recorded by each microphone, followed by a DNN-based front end spectral mapper that predicts clean filterbank features. The source separation algorithm MESSL (Model-based EM Source Separation and Localization) has been extended from two channels to multiple channels in order to meet the demands of the challenge. We report on interactions between the two systems, cross-cut by the use of a robust beamforming algorithm called BeamformIt. Evaluations of different system settings reveal that combining MESSL and the spectral mapper together on the baseline beamformer algorithm boosts the performance substantially.
Keywords :
"Microphones","Source separation","Speech","Noise measurement","Array signal processing","Robustness","Time-frequency analysis"
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
DOI :
10.1109/ASRU.2015.7404836