مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving deep neural networks for LVCSR using dropout and shrinking structure

DocumentCode :

180082

Title :

Improving deep neural networks for LVCSR using dropout and shrinking structure

Author :

Shiliang Zhang ; Yebo Bao ; Pan Zhou ; Hui Jiang ; Lirong Dai

Author_Institution :

Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

6849

Lastpage :

6853

Abstract :

Recently, the hybrid deep neural networks and hidden Markov models (DNN/HMMs) have achieved dramatic gains over the conventional GMM/HMMs method on various large vocabulary continuous speech recognition (LVCSR) tasks. In this paper, we propose two new methods to further improve the hybrid DNN/HMMs model: i) use dropout as pre-conditioner (DAP) to initialize DNN prior to back-propagation (BP) for better recognition accuracy; ii) employ a shrinking DNN structure (sDNN) with hidden layers decreasing in size from bottom to top for the purpose of reducing model size and expediting computation time. The proposed DAP method is evaluated in a 70-hour Mandarin transcription (PSC) task and the 309-hour Switchboard (SWB) task. Compared with the traditional greedy layer-wise pre-trained DNN, it can achieve about 10% and 6.8% relative recognition error reduction for PSC and SWB tasks respectively. In addition, we also evaluate sDNN as well as its combination with DAP on the SWB task. Experimental results show that these methods can reduce model size to 45% of original size and accelerate training and test time by 55%, without losing recognition accuracy.

Keywords :

backpropagation; hidden Markov models; natural language processing; neural nets; speech recognition; DAP; LVCSR tasks; Mandarin transcription task; PSC; SWB task; backpropagation; dropout as preconditioner; hidden Markov models; hybrid DNN/HMMs model; hybrid deep neural networks; large vocabulary continuous speech recognition tasks; model size reduction; sDNN; shrinking DNN structure; switchboard task; time 309 hr; time 70 hour; Computational modeling; Hidden Markov models; Neural networks; Speech; Speech recognition; Switches; Training; DNN-HMM; LVCSR; deep neural networks; dropout; dropout as pre-conditioner (DAP); shrinking hidden layer;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854927

Filename :

6854927

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=180082