DocumentCode :
179610
Title :
Single-channel mixed speech recognition using deep neural networks
Author :
Chao Weng ; Dong Yu ; Seltzer, Michael L. ; Droppo, Jasha
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
5632
Lastpage :
5636
Abstract :
In this work, we study the problem of single-channel mixed speech recognition using deep neural networks (DNNs). Using a multi-style training strategy on artificially mixed speech data, we investigate several different training setups that enable the DNN to generalize to corresponding similar patterns in the test data. We also introduce a WFST-based two-talker decoder to work with the trained DNNs. Experiments on the 2006 speech separation and recognition challenge task demonstrate that the proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an overall WER of 19.7% which improves upon the results obtained by the state-of-the-art IBM superhuman system by 1.9% absolute, with fewer assumptions and lower computational complexity.
Keywords :
computational complexity; neural nets; speech codecs; speech recognition; DNN-based system; IBM superhuman system; WER; WFST-based two-talker decoder; computational complexity; deep neural networks; mixed speech data; multistyle training strategy; single-channel mixed speech recognition; speech separation; test data; trained DNN; Acoustics; Decoding; Joints; Speech; Speech recognition; Switches; Training; DNN; WFST; multi-talker ASR;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854681
Filename :
6854681
Link To Document :
بازگشت