DocumentCode :
179882
Title :
Two-stage speaker adaptation in subspace Gaussian mixture models
Author :
Ghalehjegh, Sina Hamidi ; Rose, Richard C.
Author_Institution :
Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
6324
Lastpage :
6328
Abstract :
A two-stage speaker adaptation approach is proposed for the subspace Gaussian mixture model (SGMM) [1] in large vocabulary automatic speech recognition (ASR). The SGMM differs from the more well known continuous density hidden Markov model (CDHMM) in that a large portion of the SGMM parameters are dedicated to shared full covariance Gaussian subspace parameters and a relatively small number of parameters are used for state dependent projection vectors. Both model space and feature space adaptation are investigated. First, an efficient regression based approach for subspace vector adaptation (SVA) is presented. Second, an efficient approach is presented for feature space adaptation using constrained maximum likelihood linear regression (CMLLR) in the SGMM. While both of these adaptation scenarios have previously been investigated in the context of the SGMM [2, 3], a more efficient and numerically stable procedure is presented here for estimating the parameters of the regression based transformations. Both transformation matrices are obtained using an optimization technique that iteratively updates the rows of the regression matrices. It is shown that using these feature space and model space approaches for unsupervised speaker adaptation provides complementary improvements in SGMM based ASR word accuracy.
Keywords :
Gaussian processes; Markov processes; optimisation; regression analysis; speaker recognition; ASR word accuracy; CDHMM; CMLLR; SGMM; SGMM parameters; constrained maximum likelihood linear regression; continuous density hidden Markov model; covariance Gaussian subspace parameters; feature space adaptation; optimization technique; regression based transformations; regression matrices; state dependent projection vectors; subspace Gaussian mixture models; subspace vector adaptation; two-stage speaker adaptation; two-stage speaker adaptation approach; unsupervised speaker adaptation; Acoustics; Adaptation models; Gaussian mixture model; Hidden Markov models; Speech; Speech recognition; Vectors; Constrained MLLR; Phonetic subspace; Row-by-row update; Speaker adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854821
Filename :
6854821
Link To Document :
بازگشت