مرکز منطقه ای اطلاع رساني علوم و فناوري - Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

DocumentCode :

3485099

Title :

Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

Author :

Itoh, Arata ; Hara, Sunao ; Kitaoka, Norihide ; Takeda, Kazuya

Author_Institution :

Dept. of Inf. Sci., Nagoya Univ., Nagoya, Japan

fYear :

2011

fDate :

11-15 Dec. 2011

Firstpage :

169

Lastpage :

172

Abstract :

In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers´ data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers´ features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.

Keywords :

maximum likelihood estimation; regression analysis; speaker recognition; acoustic model training method; inverse CMLLR transformation; maximum likelihood linear regression transformation-based feature generation; pseudo-speaker features; robust seed model training; speaker adaptation; speech recognition; Acoustics; Adaptation models; Hidden Markov models; Speech; Speech recognition; Training; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on

Conference_Location :

Waikoloa, HI

Print_ISBN :

978-1-4673-0365-1

Electronic_ISBN :

978-1-4673-0366-8

Type :

conf

DOI :

10.1109/ASRU.2011.6163925

Filename :

6163925

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3485099