DocumentCode :
134248
Title :
Rapid bayesian learning for recurrent neural network language model
Author :
Jen-Tzung Chien ; Yuan-Chu Ku ; Mou-Yue Huang
Author_Institution :
Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
34
Lastpage :
38
Abstract :
This paper presents Bayesian learning for recurrent neural network language model (RNN-LM). Our goal is to regularize the RNN-LM by compensating for the randomness of the estimated model parameters which is characterized by a Gaussian prior. This model is not only constructed by training the synaptic weight parameters according to the maximum a posteriori criterion but also regularized by estimating the Gaussian hyper-parameter through the type 2 maximum likelihood. However, a critical issue in Bayesian RNN-LM is the heavy computation of Hessian matrix which is formed as the sum of a large amount of outer-products of high-dimensional gradient vectors. We present a rapid approximation to reduce the redundancy due to the curse of dimensionality and speed up the calculation by summing up only the salient outer-products. Experiments on 1B-Word Benchmark, Penn Treebank and World Street Journal corpora show that rapid Bayesian RNN-LM consistently improves the perplexity and word error rate in comparison with standard RNN-LM.
Keywords :
Bayes methods; Hessian matrices; gradient methods; learning (artificial intelligence); parameter estimation; recurrent neural nets; speech recognition; 1B-Word Benchmark; Gaussian hyper-parameter estimation; Hessian matrix; Penn Treebank; World Street Journal corpora; high-dimensional gradient vector; maximum a posteriori criterion; model parameter estimation; rapid Bayesian RNN-LM; rapid Bayesian learning; rapid approximation; recurrent neural network language model; synaptic weight parameters; type 2 maximum likelihood; word error rate; Approximation methods; Bayes methods; Computational modeling; Recurrent neural networks; Speech recognition; Training; Vectors; Bayesian learning; Hessian matrix; Recurrent neural network language model; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936640
Filename :
6936640
Link To Document :
بازگشت