DocumentCode
980883
Title
Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
Author
Mak, Brian Kan-Wing ; Hsiao, Roger Wend-Huu ; Ho, Simon Ka-Lung ; Kwok, James T.
Author_Institution
Dept. of Comput. Sci., Hong Kong Univ. of Sci. & Technol.
Volume
14
Issue
4
fYear
2006
fDate
7/1/2006 12:00:00 AM
Firstpage
1267
Lastpage
1280
Abstract
Recently, we proposed an improvement to the conventional eigenvoice (EV) speaker adaptation using kernel methods. In our novel kernel eigenvoice (KEV) speaker adaptation, speaker supervectors are mapped to a kernel-induced high dimensional feature space, where eigenvoices are computed using kernel principal component analysis. A new speaker model is then constructed as a linear combination of the leading eigenvoices in the kernel-induced feature space. KEV adaptation was shown to outperform EV, MAP, and MLLR adaptation in a TIDIGITS task with less than 10 s of adaptation speech. Nonetheless, due to many kernel evaluations, both adaptation and subsequent recognition in KEV adaptation are considerably slower than conventional EV adaptation. In this paper, we solve the efficiency problem and eliminate all kernel evaluations involving adaptation or testing observations by finding an approximate pre-image of the implicit adapted model found by KEV adaptation in the feature space; we call our new method embedded kernel eigenvoice (eKEV) adaptation. eKEV adaptation is faster than KEV adaptation, and subsequent recognition runs as fast as normal HMM decoding. eKEV adaptation makes use of multidimensional scaling technique so that the resulting adapted model lies in the span of a subset of carefully chosen training speakers. It is related to the reference speaker weighting (RSW) adaptation method that is based on speaker clustering. Our experimental results on Wall Street Journal show that eKEV adaptation continues to outperform EV, MAP, MLLR, and the original RSW method. However, by adopting the way we choose the subset of reference speakers for eKEV adaptation, we may also improve RSW adaptation so that it performs as well as our eKEV adaptation
Keywords
eigenvalues and eigenfunctions; hidden Markov models; maximum likelihood estimation; principal component analysis; regression analysis; speaker recognition; HMM decoding; MAP; embedded kernel eigenvoice speaker adaptation; high dimensional feature space; kernel principal component analysis; maximum a posteriori adaptation; maximum likelihood linear regression; multidimensional scaling technique; preimage approximation; reference speaker weighting; speaker clustering; speaker supervectors; subsequent recognition; Councils; Decoding; Hidden Markov models; Kernel; Management training; Maximum likelihood linear regression; Multidimensional systems; Principal component analysis; Speech recognition; Testing; Composite kernels; eigenvoice speaker adaptation; kernel eigenvoice speaker adaptation; kernel principal component analysis (PCA); pre-image problem; reference speaker weighting;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TSA.2005.860836
Filename
1643654
Link To Document