DocumentCode :
730667
Title :
Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables
Author :
Tuske, Zoltan ; Tahir, Muhammad Ali ; Schluter, Ralf ; Ney, Hermann
Author_Institution :
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
4285
Lastpage :
4289
Abstract :
In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.
Keywords :
Gaussian processes; broadcasting; hidden Markov models; mixture models; neural nets; optimisation; speech recognition; GMM; Gaussian mixture model; HMM; LMM; broadcast news; conversations task; deep neural networks; feedforward hidden layers; hidden Markov model; hidden variables; log-linear mixture model; optimization; posterior probability estimates; recognition gains; rectified linear unit activation functions; sigmoid; softmax layer; speech recognition; summation pooling layer; Acoustics; Approximation methods; Artificial neural networks; Hidden Markov models; Joints; Training; ASR; DNN; GMM; LMM; Log-linear; bottleneck; hybrid; mixture model; neural network; tandem;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178779
Filename :
7178779
Link To Document :
بازگشت