DocumentCode :
134192
Title :
Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes
Author :
Changqing Kong ; Shaofei Xue ; Jianqing Gao ; Wu Guo ; Lirong Dai ; Hui Jiang
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2014
fDate :
12-14 Sept. 2014
Firstpage :
83
Lastpage :
87
Abstract :
Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features are modified to match the given speaker-independent models or the speaker-independent models are transformed towards one particular speaker based on the discriminative learning of speaker codes. Previous researches have shown that these proposed SC-based adaptation methods are very effective to adapt large DNN models using only a small amount of adaptation data. In this work, we have explored the combination of direct speaker adaptation technique in model space based on speaker codes (mSA-SC) and bottleneck features where mSA-SC is used as an extraction instrument of speaker adaptive bottleneck features. We have evaluated the proposed speaker adaptive bottleneck features extraction method in two speech recognition tasks, namely PSC Mandarin task and large scale 320-hr Switchboard task. Experimental results have verified that it is quite suitable for very large scale tasks. For example, the Switchboard results have shown that it can achieve relative 9% reduction in word error rate on an unsupervised speaker adaptation scheme.
Keywords :
learning (artificial intelligence); natural language processing; speaker recognition; speech coding; DNN model; LVCSR; PSC Mandarin task; Switchboard task; discriminative learning; extraction instrument; fast speaker adaptation method; hybrid DNN-HMM speech recognition model; mSA-SC; speaker adaptation technique; speaker adaptive bottleneck features extraction method; speaker codes; speaker-independent model; speech recognition task; target speaker feature; unsupervised speaker adaptation scheme; word error rate; Adaptation models; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Switches; Training; Bottleneck Features; Deep Neural Network (DNN); Hybrid DNNHMM; Speaker Adaptation; Speaker Codes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
Type :
conf
DOI :
10.1109/ISCSLP.2014.6936584
Filename :
6936584
Link To Document :
بازگشت