مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes

DocumentCode :

134192

Title :

Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes

Author :

Changqing Kong ; Shaofei Xue ; Jianqing Gao ; Wu Guo ; Lirong Dai ; Hui Jiang

Author_Institution :

Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China

fYear :

2014

fDate :

12-14 Sept. 2014

Firstpage :

Lastpage :

Abstract :

Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features are modified to match the given speaker-independent models or the speaker-independent models are transformed towards one particular speaker based on the discriminative learning of speaker codes. Previous researches have shown that these proposed SC-based adaptation methods are very effective to adapt large DNN models using only a small amount of adaptation data. In this work, we have explored the combination of direct speaker adaptation technique in model space based on speaker codes (mSA-SC) and bottleneck features where mSA-SC is used as an extraction instrument of speaker adaptive bottleneck features. We have evaluated the proposed speaker adaptive bottleneck features extraction method in two speech recognition tasks, namely PSC Mandarin task and large scale 320-hr Switchboard task. Experimental results have verified that it is quite suitable for very large scale tasks. For example, the Switchboard results have shown that it can achieve relative 9% reduction in word error rate on an unsupervised speaker adaptation scheme.

Keywords :

learning (artificial intelligence); natural language processing; speaker recognition; speech coding; DNN model; LVCSR; PSC Mandarin task; Switchboard task; discriminative learning; extraction instrument; fast speaker adaptation method; hybrid DNN-HMM speech recognition model; mSA-SC; speaker adaptation technique; speaker adaptive bottleneck features extraction method; speaker codes; speaker-independent model; speech recognition task; target speaker feature; unsupervised speaker adaptation scheme; word error rate; Adaptation models; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Switches; Training; Bottleneck Features; Deep Neural Network (DNN); Hybrid DNNHMM; Speaker Adaptation; Speaker Codes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location :

Singapore

Type :

conf

DOI :

10.1109/ISCSLP.2014.6936584

Filename :

6936584

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=134192