Title :
Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition
Author :
Shaofei Xue ; Hui Jiang ; Lirong Dai ; Qingfeng Liu
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
Recently, we have proposed a general adaptation scheme for deep neural network based on discriminant condition codes and applied it to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion [1, 2, 3, 4]. In this case, each condition code is associated with one speaker in data, which is thus called speaker code for convenience. Our previous work has shown that speaker code based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, we have to use a large speaker code size and complex processes to obtain the best ASR performance since good initializations of speaker codes and connection weights are very important. In this paper, we propose a method using singular value decomposition (SVD) as in [5] to initialize speaker codes and connection weights to obtain a comparable ASR performance as before but with a smaller speaker code size and much less computation complexity. Meanwhile, we have evaluated unsupervised speaker adaptation with the proposed method in large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective for providing well initializations and suitable in adapting large DNN models.
Keywords :
entropy; neural nets; singular value decomposition; speech recognition; unsupervised learning; DNN; SVD; connection weights; deep neural network; discriminant condition codes; frame-level cross-entropy; large vocabulary speech recognition; sequence-level maximum mutual information training criterion; singular value decomposition; speaker code; unsupervised speaker adaptation; Adaptation models; Hidden Markov models; Matrix decomposition; Neural networks; Speech; Speech recognition; Training; Deep Neural Network (DNN); Speaker Adaptation; Speaker Code; singular value decomposition (SVD);
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178833