مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving deep neural networks for LVCSR using rectified linear units and dropout

DocumentCode :

1697151

Title :

Improving deep neural networks for LVCSR using rectified linear units and dropout

Author :

Dahl, George E. ; Sainath, Tara N. ; Hinton, Geoffrey E.

Author_Institution :

Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada

fYear :

2013

Firstpage :

8609

Lastpage :

8613

Abstract :

Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid over-fitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code.

Keywords :

Bayes methods; Gaussian processes; acoustic signal processing; hidden Markov models; neural nets; optimisation; Bayesian optimization code; DNN; English broadcast news task; GMM/HMM system; Gaussian mixture models; LVCSR; ReLU nonlinearities; acoustic models; computer vision tasks; deep neural networks; frame level training; generalization error; hyper-parameter tuning; random dropout procedure; rectified linear unit; sigmoid units; small-scale phone recognition task; vocabulary speech recognition benchmarks; Acoustics; Bayes methods; Hidden Markov models; Neural networks; Optimization; Speech recognition; Training; Bayesian optimization; LVCSR; acoustic modeling; broadcast news; deep learning; dropout; neural networks; rectified linear units;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639346

Filename :

6639346

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1697151