Optimal tying of HMM mixture densities using decision trees

Author

Boulianne, Gilles ; Kenny, Patrick

Author_Institution

Spoken Word Technol., Montreal, Que., Canada

Volume

1

fYear

1996

fDate

3-6 Oct 1996

Firstpage

350

Abstract

The most detailed acoustic models in our two-pass speaker-independent, continuous speech recognition system are context-dependent models, which become more difficult to adequately train as the number of different contexts becomes large. Tying of model parameters or clustering of model densities based on bottom-up agglomerative procedures can efficiently reduce the number of parameters to train, but suffer from the additional problem of how to model untrained contexts. Top-down clustering with a decision tree can provide well-trained models for any context, whether seen or unseen in training. Trees are built from a root node that is successively split by selecting, among questions about phonetic context, one that provides the best segregation of data. Several goodness of split criterions have been proposed, such as Poisson-based (Bahl et al., 1991), or single Gaussian-based (Bahl et al., 1994), their choice being primarily motivated by computational considerations. We show, from maximum likelihood considerations, how to derive a computationally efficient criterion based on a different approximation using tied mixtures of Gaussian densities

Keywords

Gaussian processes; decision theory; hidden Markov models; maximum likelihood estimation; speech recognition; trees (mathematics); Gaussian densities; Gaussian-based method; HMM mixture density tying; Poisson-based method; acoustic models; bottom-up procedures; context-dependent models; continuous speech recognition system; data segregation; decision trees; goodness of split criterion; hidden Markov model; maximum likelihood estimation; model density clustering; model parameters; phonetic context; top-down clustering; training; two-pass speaker-independent recognition; Context modeling; Decision trees; Gaussian processes; Hidden Markov models; Maximum likelihood estimation; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607126

Filename

607126