DocumentCode
730711
Title
Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition
Author
Shaofei Xue ; Hui Jiang ; Lirong Dai ; Qingfeng Liu
Author_Institution
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2015
fDate
19-24 April 2015
Firstpage
4555
Lastpage
4559
Abstract
Recently, we have proposed a general adaptation scheme for deep neural network based on discriminant condition codes and applied it to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion [1, 2, 3, 4]. In this case, each condition code is associated with one speaker in data, which is thus called speaker code for convenience. Our previous work has shown that speaker code based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, we have to use a large speaker code size and complex processes to obtain the best ASR performance since good initializations of speaker codes and connection weights are very important. In this paper, we propose a method using singular value decomposition (SVD) as in [5] to initialize speaker codes and connection weights to obtain a comparable ASR performance as before but with a smaller speaker code size and much less computation complexity. Meanwhile, we have evaluated unsupervised speaker adaptation with the proposed method in large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective for providing well initializations and suitable in adapting large DNN models.
Keywords
entropy; neural nets; singular value decomposition; speech recognition; unsupervised learning; DNN; SVD; connection weights; deep neural network; discriminant condition codes; frame-level cross-entropy; large vocabulary speech recognition; sequence-level maximum mutual information training criterion; singular value decomposition; speaker code; unsupervised speaker adaptation; Adaptation models; Hidden Markov models; Matrix decomposition; Neural networks; Speech; Speech recognition; Training; Deep Neural Network (DNN); Speaker Adaptation; Speaker Code; singular value decomposition (SVD);
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178833
Filename
7178833
Link To Document