DocumentCode :
1532339
Title :
Adaptation of Hidden Markov Models Using Model-as-Matrix Representation
Author :
Jeong, Yongwon
Author_Institution :
School of Electrical Engineering, Pusan National University, Busan, Korea
Volume :
20
Issue :
8
fYear :
2012
Firstpage :
2352
Lastpage :
2364
Abstract :
In this paper, we describe basis-based speaker adaptation techniques using the matrix representation of training models. Bases are obtained from training models by decomposition techniques for matrix-variate objects: two-dimensional principal component analysis (2DPCA) and generalized low rank approximations of matrices (GLRAM). The motivation for using matrix representation is that the sample covariance matrix of training models can be more accurately computed and the speaker weight becomes a matrix. Speaker adaptation equations are derived in the maximum-likelihood (ML) framework, and the adaptation equations can be solved using the maximum-likelihood linear regression technique. Additionally, novel applications of probabilistic 2DPCA and GLRAM to speaker adaptation are presented. From the probabilistic 2DPCA/GLRAM of training models, speaker adaptation equations are formulated in the maximum a posteriori (MAP) framework. The adaptation equations can be solved using the MAP linear regression technique. In the isolated-word experiments, the matrix representation-based methods in the ML and MAP frameworks outperformed maximum-likelihood linear regression adaptation, MAP adaptation, eigenvoice, and probabilistic PCA-based model for adaptation data longer than 20 s. Furthermore, the adaptation methods using probabilistic 2DPCA/GLRAM showed additional performance improvement over the adaptation methods using 2DPCA/GLRAM for small amounts of adaptation data.
Keywords :
Adaptation models; Computational modeling; Covariance matrix; Hidden Markov models; Mathematical model; Training; Vectors; Generalized low rank approximations of matrices; matrix-variate distribution; speaker adaptation; speech recognition; two-dimensional principal component analysis (2DPCA);
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2202649
Filename :
6212332
Link To Document :
بازگشت