DocumentCode :
417257
Title :
Basis superposition precision matrix modelling for large vocabulary continuous speech recognition
Author :
Sim, K.C. ; Gales, M.J.F.
Author_Institution :
Dept. of Eng., Cambridge Univ., UK
Volume :
1
fYear :
2004
fDate :
17-21 May 2004
Abstract :
An important aspect of using Gaussian mixture models in a HMM-based speech recognition systems is the form of the covariance matrix. One successful approach has been to model the inverse covariance, precision, matrix by superimposing multiple bases. This paper presents a general framework of basis superposition. Models are described in terms of parameter tying of the basis coefficients and restrictions in the number of basis. Two forms of parameter tying are described which provide a compact model structure. The first constrains the basis coefficients over multiple basis vectors (or matrices). This is related to the Subspace for Precision and Mean (SPAM) model. The second constrains the basis coefficients over multiple components, yielding as one example heteroscedastic LDA (HLDA). Both maximum likelihood and minimum phone error training of these models are discussed. The performance of various configurations is examined on a conversational telephone speech task, SwitchBoard.
Keywords :
Gaussian distribution; covariance matrices; hidden Markov models; matrix inversion; maximum likelihood estimation; speech processing; speech recognition; telephony; vocabulary; Gaussian mixture models; HLDA; HMM-based speech recognition systems; SPAM model; Subspace for Precision and Mean model; SwitchBoard; basis superposition; compact model structure; conversational telephone speech task; heteroscedastic LDA; inverse covariance matrix; large vocabulary continuous speech recognition; maximum likelihood training; minimum phone error training; multiple basis vectors; parameter tying; performance; precision matrix modelling; Covariance matrix; Hidden Markov models; Inverse problems; Linear discriminant analysis; Maximum likelihood estimation; Speech recognition; Symmetric matrices; Telephony; Unsolicited electronic mail; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-8484-9
Type :
conf
DOI :
10.1109/ICASSP.2004.1326107
Filename :
1326107
Link To Document :
بازگشت