DocumentCode :
3744900
Title :
Speaker adaptive joint training of Gaussian mixture models and bottleneck features
Author :
Zolt?n T?ske;Pavel Golik;Ralf Schl?ter;Hermann Ney
Author_Institution :
Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, 52056 Aachen, Germany
fYear :
2015
Firstpage :
596
Lastpage :
603
Abstract :
In the tandem approach, the output of a neural network (NN) serves as input features to a Gaussian mixture model (GMM) aiming to improve the emission probability estimates. As has been shown in our previous work, GMM with pooled covariance matrix can be integrated into a neural network framework as a softmax layer with hidden variables, which allows for joint estimation of both neural network and Gaussian mixture parameters. Here, this approach is extended to include speaker adaptive training (SAT) by introducing a speaker dependent neural network layer. Error backpropagation beyond this speaker dependent layer realizes the adaptive training of the Gaussian parameters as well as the optimization of the bottleneck (BN) tandem features of the underlying acoustic model, simultaneously. In this study, after the initialization by constrained maximum likelihood linear regression (CMLLR) the speaker dependent layer itself is kept constant during the joint training. Experiments show that the deeper backpropagation through the speaker dependent layer is necessary for improved recognition performance. The speaker adaptively and jointly trained BN-GMM results in 5% relative improvement over very strong speaker-independent hybrid baseline on the Quaero English broadcast news and conversations task, and on the 300-hour Switchboard task.
Keywords :
"Adaptation models","Training","Acoustics","Hidden Markov models","Robustness","Artificial neural networks"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404850
Filename :
7404850
Link To Document :
بازگشت