DocumentCode :
177455
Title :
Mean-normalized stochastic gradient for large-scale deep learning
Author :
Wiesler, Simon ; Richard, Alexander ; Schluter, Ralf ; Ney, Hermann
Author_Institution :
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
180
Lastpage :
184
Abstract :
Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.
Keywords :
convergence; gradient methods; learning (artificial intelligence); neural nets; speech recognition; stochastic programming; Newbob learning rate strategy; convergence; decoding; deep neural networks; factorized structure; large-scale deep learning; mean-normalized stochastic gradient; model size; model structure; nonzero mean; second-order stochastic optimization algorithm; speech recognition error rate; training model; Error analysis; Neural networks; Optimization; Speech; Speech recognition; Stochastic processes; Training; LVCSR; deep learning; optimization; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853582
Filename :
6853582
Link To Document :
بازگشت