DocumentCode :
179888
Title :
Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code
Author :
Shaofei Xue ; Abdel-Hamid, Ossama ; Hui Jiang ; Lirong Dai
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
6339
Lastpage :
6343
Abstract :
Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].
Keywords :
backpropagation; neural nets; speaker recognition; speech codecs; LVCSR; adaptation method; adaptation neural networks; direct SC-based adaptation method; direct adaptation; discriminative speaker code; fast speaker adaptation; hybrid DNN-HMM model; speaker code; speaker codes; speech recognition; standard backpropagation algorithm; switchboard task; time 320 hr; word error rate; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Training data; Deep Neural Network (DNN); Fast Speaker Adaptation; Hybrid DNN-HMM; Speaker Code;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854824
Filename :
6854824
Link To Document :
بازگشت