DocumentCode :
3485099
Title :
Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation
Author :
Itoh, Arata ; Hara, Sunao ; Kitaoka, Norihide ; Takeda, Kazuya
Author_Institution :
Dept. of Inf. Sci., Nagoya Univ., Nagoya, Japan
fYear :
2011
fDate :
11-15 Dec. 2011
Firstpage :
169
Lastpage :
172
Abstract :
In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers´ data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers´ features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.
Keywords :
maximum likelihood estimation; regression analysis; speaker recognition; acoustic model training method; inverse CMLLR transformation; maximum likelihood linear regression transformation-based feature generation; pseudo-speaker features; robust seed model training; speaker adaptation; speech recognition; Acoustics; Adaptation models; Hidden Markov models; Speech; Speech recognition; Training; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location :
Waikoloa, HI
Print_ISBN :
978-1-4673-0365-1
Electronic_ISBN :
978-1-4673-0366-8
Type :
conf
DOI :
10.1109/ASRU.2011.6163925
Filename :
6163925
Link To Document :
بازگشت