Title :
Joint constrained maximum likelihood regression for overlapping speech recognition
Author :
Kumatani, Kenichi ; Singh, Rajdeep ; Faubel, Friedrich ; McDonough, John ; Oualil, Youssef
Author_Institution :
Spansion Inc., Sunnyvale, CA, USA
Abstract :
Adaptation techniques for speech recognition are very effective in single-speaker scenarios. However, when distant microphones capture overlapping speech from multiple speakers, conventional speaker adaptation methods are less effective. The putative signal for any speaker contains interference from other speakers. Consequently, any adaptation technique adapts the model to the interfering speakers as well, which leads to degradation of recognition performance for the desired speaker. In this work, we develop a new feature-space adaptation method for overlapping speech. We first build a beamformer to enhance speech from each active speaker. After that, we compute speech feature vectors from the output of each beamformer. We then jointly transform the feature vectors from all speakers to maximize the likelihood of their respective acoustic models. Experiments run on the speech separation challenge data collected under the AMI project demonstrate the effectiveness of our adaptation method. An absolute word error rate (WER) reduction up to 14 % was achieved in the case of delay-and-sum beamforming. With minimum mutual information (MMI) beamforming, our adaptation method achieved a WER of 31.5 %. To the best of our knowledge, this is the lowest WER reported on this task.
Keywords :
array signal processing; error statistics; maximum likelihood estimation; microphones; regression analysis; speech recognition; MMI beamforming; WER reduction; acoustic models; active speaker; adaptation techniques; beamformer; conventional speaker adaptation methods; delay-and-sum beamforming; distant microphones; feature-space adaptation method; interfering speakers; joint constrained maximum likelihood regression; minimum mutual information beamforming; multiple speakers; overlapping speech recognition; putative signal; recognition performance; single-speaker scenarios; speech feature vectors; speech separation challenge data; word error rate reduction; Adaptation models; Array signal processing; Hidden Markov models; Microphones; Speech; Speech recognition; Vectors; distant speech recognition; feature-space adaptation; microphone array; overlapping speech; speech separation;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6637621