DocumentCode :
1685886
Title :
A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition
Author :
Pan Zhou ; Cong Liu ; Qingfeng Liu ; Lirong Dai ; Hui Jiang
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2013
Firstpage :
6650
Lastpage :
6654
Abstract :
Recently a pre-trained context-dependent hybrid deep neural network (DNN) and HMM method has achieved significant performance gain in many large-scale automatic speech recognition (ASR) tasks. However, the error back-propagation (BP) algorithm for training neural networks is sequential in nature and is hard to parallelize into multiple computing threads. Therefore, training a deep neural network is extremely time-consuming even with a modern GPU board. In this paper we have proposed a new acoustic modelling framework to use multiple DNNs instead of a single DNN to compute the posterior probabilities of tied HMM states. In our method, all tied states of context-dependent HMMs are first grouped into several disjoined clusters based on the training data associated with these HMM states. Then, several hierarchically structured DNNs are trained separately for these disjoined clusters of data using multiple GPUs. In decoding, the final posterior probability of each tied HMM state can be calculated based on output posteriors from multiple DNNs. We have evaluated the proposed method on a 64-hour Mandarin transcription task and 309-hour Switchboard Hub5 task. Experimental results have shown that the new method using clusterbased multiple DNNs can achieve over 5 times reduction in total training time with only negligible performance degradation (about 1-2% in average) when using 3 or 4 GPUs respectively.
Keywords :
backpropagation; decoding; hidden Markov models; pattern clustering; speech recognition; ASR; BP algorithm; Mandarin transcription task; Switchboard Hub5 task; cluster-based multiple deep neural networks method; context-dependent hybrid DNN; decoding; error backpropagation algorithm; hidden Markov model method; large vocabulary continuous speech recognition; large-scale HMM; large-scale automatic speech recognition; modern GPU board; multiple computing threads; posterior probabilities; time 309 hour; time 64 hour; training data; Acoustics; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Training data; DNN; LVCSR; cluster-based multi-DNN; parallelization among GPUs; state clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6638948
Filename :
6638948
Link To Document :
بازگشت