DocumentCode
417257
Title
Basis superposition precision matrix modelling for large vocabulary continuous speech recognition
Author
Sim, K.C. ; Gales, M.J.F.
Author_Institution
Dept. of Eng., Cambridge Univ., UK
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
An important aspect of using Gaussian mixture models in a HMM-based speech recognition systems is the form of the covariance matrix. One successful approach has been to model the inverse covariance, precision, matrix by superimposing multiple bases. This paper presents a general framework of basis superposition. Models are described in terms of parameter tying of the basis coefficients and restrictions in the number of basis. Two forms of parameter tying are described which provide a compact model structure. The first constrains the basis coefficients over multiple basis vectors (or matrices). This is related to the Subspace for Precision and Mean (SPAM) model. The second constrains the basis coefficients over multiple components, yielding as one example heteroscedastic LDA (HLDA). Both maximum likelihood and minimum phone error training of these models are discussed. The performance of various configurations is examined on a conversational telephone speech task, SwitchBoard.
Keywords
Gaussian distribution; covariance matrices; hidden Markov models; matrix inversion; maximum likelihood estimation; speech processing; speech recognition; telephony; vocabulary; Gaussian mixture models; HLDA; HMM-based speech recognition systems; SPAM model; Subspace for Precision and Mean model; SwitchBoard; basis superposition; compact model structure; conversational telephone speech task; heteroscedastic LDA; inverse covariance matrix; large vocabulary continuous speech recognition; maximum likelihood training; minimum phone error training; multiple basis vectors; parameter tying; performance; precision matrix modelling; Covariance matrix; Hidden Markov models; Inverse problems; Linear discriminant analysis; Maximum likelihood estimation; Speech recognition; Symmetric matrices; Telephony; Unsolicited electronic mail; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326107
Filename
1326107
Link To Document