DocumentCode :
730679
Title :
Joint training of front-end and back-end deep neural networks for robust speech recognition
Author :
Tian Gao ; Jun Du ; Li-Rong Dai ; Chin-Hui Lee
Author_Institution :
Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4375
Lastpage :
4379
Abstract :
Based on the recently proposed speech pre-processing front-end with deep neural networks (DNNs), we first investigate different feature mapping directly from noisy speech via DNN for robust speech recognition. Next, we propose to jointly train a single DNN for both feature mapping and acoustic modeling. In the end, we show that the word error rate (WER) of the jointly trained system could be significantly reduced by the fusion of multiple DNN pre-processing systems which implies that features obtained from different domains of the DNN-enhanced speech signals are strongly complementary. Testing on the Aurora4 noisy speech recognition task our best system with multi-condition training can achieves an average WER of 10.3%, yielding a relative reduction of 16.3% over our previous DNN pre-processing only system with a WER of 12.3%. To the best of our knowledge, this represents the best published result on the Aurora4 task without using any adaptation techniques.
Keywords :
neural nets; speech recognition; Aurora4 noisy speech recognition; DNN; WER; acoustic modeling; back end deep neural networks; feature mapping; front end deep neural networks; joint training; robust speech recognition; word error rate; Acoustics; Hidden Markov models; Joints; Noise measurement; Speech; Speech recognition; Training; deep neural network; feature mapping; joint training; robust speech recognition; system fusion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178797
Filename :
7178797
Link To Document :
بازگشت