DocumentCode :
1697151
Title :
Improving deep neural networks for LVCSR using rectified linear units and dropout
Author :
Dahl, George E. ; Sainath, Tara N. ; Hinton, Geoffrey E.
Author_Institution :
Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
fYear :
2013
Firstpage :
8609
Lastpage :
8613
Abstract :
Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid over-fitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code.
Keywords :
Bayes methods; Gaussian processes; acoustic signal processing; hidden Markov models; neural nets; optimisation; Bayesian optimization code; DNN; English broadcast news task; GMM/HMM system; Gaussian mixture models; LVCSR; ReLU nonlinearities; acoustic models; computer vision tasks; deep neural networks; frame level training; generalization error; hyper-parameter tuning; random dropout procedure; rectified linear unit; sigmoid units; small-scale phone recognition task; vocabulary speech recognition benchmarks; Acoustics; Bayes methods; Hidden Markov models; Neural networks; Optimization; Speech recognition; Training; Bayesian optimization; LVCSR; acoustic modeling; broadcast news; deep learning; dropout; neural networks; rectified linear units;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639346
Filename :
6639346
Link To Document :
بازگشت