DocumentCode :
177461
Title :
On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition
Author :
Shilin Liu ; Khe Chai Sim
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
195
Lastpage :
199
Abstract :
Recently, context-dependent Deep Neural Network (CD-DNN) has been found to significantly outperform Gaussian Mixture Model (GMM) for various large vocabulary continuous speech recognition tasks. Unlike the GMM approach, there is no meaningful interpretation of the DNN parameters, which makes it difficult to devise effective adaptation methods for DNNs. Furthermore, DNN parameter estimation is based on discriminative criteria, which is more sensitive to label errors and therefore less reliable for unsupervised adaptation. Many effective adaptation techniques that have been developed and proven to work well for GMM/HMM systems cannot be easily applied to DNNs. Therefore, this paper proposes a novel method of combining DNN and GMM using the Temporally Varying Weight Regression framework to take advantage of the superior performance of the DNNs and the robust adaptability of the GMMs. This paper addresses the issue of incorporating the high-dimensional CD-DNN posteriors into this framework without dramatically increasing the system complexity. Experimental results on a broadcast news large vocabulary transcription task show that the proposed GMM+DNN/HMM system achieved significant performance gain over the baseline DNN/HMM system. With additional unsupervised speaker adaptation, the best GMM+DNN/HMM system obtained about 20% relative improvements over the DNN/HMM baseline.
Keywords :
Gaussian processes; neural nets; parameter estimation; regression analysis; speaker recognition; CD-DNN; DNN parameter estimation; GMM/HMM systems; Gaussian mixture model; context dependent deep neural network; continuous speech recognition; robust automatic speech recognition; temporally varying weight regression framework; unsupervised adaptation; unsupervised speaker adaptation; vocabulary transcription; Acoustics; Adaptation models; Context; Hidden Markov models; Speech; Speech recognition; Training; Deep Neural Network; Gaussian mixture model; Speaker Adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853585
Filename :
6853585
Link To Document :
بازگشت