مرکز منطقه ای اطلاع رساني علوم و فناوري - Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code

DocumentCode :

1693487

Title :

Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code

Author :

Abdel-Hamid, Ossama ; Hui Jiang

Author_Institution :

Dept. of Comput. Sci. & Eng., York Univ., Toronto, ON, Canada

fYear :

2013

Firstpage :

7942

Lastpage :

7946

Abstract :

In this paper, we propose a new fast speaker adaptation method for the hybrid NN-HMM speech recognition model. The adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes (one per speaker). The joint training method uses all training data along with speaker labels to update adaptation NN weights and speaker codes based on the standard back-propagation algorithm. In this way, the learned adaptation NN is capable of transforming each speaker features into a generic speaker-independent feature space when a small speaker code is given. Adaptation to a new speaker can be simply done by learning a new speaker code using the same back-propagation algorithm without changing any NN weights. In this method, a separate speaker code is learned for each speaker while the large adaptation NN is learned from the whole training set. The main advantage of this method is that the size of speaker codes is very small. As a result, it is possible to conduct a very fast adaptation of the hybrid NN/HMM model for each speaker based on only a small amount of adaptation data (i.e., just a few utterances). Experimental results on TIMIT have shown that it can achieve over 10% relative reduction in phone error rate by using only seven utterances for adaptation.

Keywords :

backpropagation; hidden Markov models; neural nets; speaker recognition; adaptation NN weights; discriminative learning; fast speaker adaptation; generic adaptation neural network; generic speaker-independent feature space; hybrid NN-HMM model; joint training method; speaker codes; speech recognition; standard back-propagation algorithm; Adaptation models; Artificial neural networks; Hidden Markov models; Speech; Speech recognition; Training; Vectors; Fast Adaptation; Hybrid NNHMM; Neural Network; Speaker Code;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639211

Filename :

6639211

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1693487