مرکز منطقه ای اطلاع رساني علوم و فناوري - A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition

DocumentCode :

1685886

Title :

A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition

Author :

Pan Zhou ; Cong Liu ; Qingfeng Liu ; Lirong Dai ; Hui Jiang

Author_Institution :

Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China

fYear :

2013

Firstpage :

6650

Lastpage :

6654

Abstract :

Recently a pre-trained context-dependent hybrid deep neural network (DNN) and HMM method has achieved significant performance gain in many large-scale automatic speech recognition (ASR) tasks. However, the error back-propagation (BP) algorithm for training neural networks is sequential in nature and is hard to parallelize into multiple computing threads. Therefore, training a deep neural network is extremely time-consuming even with a modern GPU board. In this paper we have proposed a new acoustic modelling framework to use multiple DNNs instead of a single DNN to compute the posterior probabilities of tied HMM states. In our method, all tied states of context-dependent HMMs are first grouped into several disjoined clusters based on the training data associated with these HMM states. Then, several hierarchically structured DNNs are trained separately for these disjoined clusters of data using multiple GPUs. In decoding, the final posterior probability of each tied HMM state can be calculated based on output posteriors from multiple DNNs. We have evaluated the proposed method on a 64-hour Mandarin transcription task and 309-hour Switchboard Hub5 task. Experimental results have shown that the new method using clusterbased multiple DNNs can achieve over 5 times reduction in total training time with only negligible performance degradation (about 1-2% in average) when using 3 or 4 GPUs respectively.

Keywords :

backpropagation; decoding; hidden Markov models; pattern clustering; speech recognition; ASR; BP algorithm; Mandarin transcription task; Switchboard Hub5 task; cluster-based multiple deep neural networks method; context-dependent hybrid DNN; decoding; error backpropagation algorithm; hidden Markov model method; large vocabulary continuous speech recognition; large-scale HMM; large-scale automatic speech recognition; modern GPU board; multiple computing threads; posterior probabilities; time 309 hour; time 64 hour; training data; Acoustics; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Training data; DNN; LVCSR; cluster-based multi-DNN; parallelization among GPUs; state clustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6638948

Filename :

6638948

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1685886