مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

DocumentCode :

134204

Title :

Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

Author :

Zhao You ; Bo Xu

Author_Institution :

Interactive Digital Media Technol. Res. Center, Inst. of Autom., Beijing, China

fYear :

2014

fDate :

12-14 Sept. 2014

Firstpage :

446

Lastpage :

449

Abstract :

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.

Keywords :

gradient methods; learning (artificial intelligence); neural nets; GMM; Gaussian mixture model; Hessian-free optimization algorithm; LBFGS algorithm; Mandarin Chinese recorded speech recognition task; SGD method; asynchronous SGD algorithm; asynchronous averaged stochastic gradient descent; deep neural network acoustic models; minibatch based SGD; network training time; one pass online learning; Acoustics; Neural networks; Optimization; Speech; Speech recognition; Stochastic processes; Training; Asynchronous averaged SGD; deep neural network; one pass learning; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location :

Singapore

Type :

conf

DOI :

10.1109/ISCSLP.2014.6936596

Filename :

6936596

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=134204