Title :
A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units
Author_Institution :
Panasonic Technol. Inc., Santa Barbara, CA, USA
fDate :
7/1/1993 12:00:00 AM
Abstract :
The author describes a large vocabulary, speaker-independent, continuous speech recognition system which is based on hidden Markov modeling (HMM) of phoneme-sized acoustic units using continuous mixture Gaussian densities. A bottom-up merging algorithm is developed for estimating the parameters of the mixture Gaussian densities, where the resultant number of mixture components is proportional to both the sample size and dispersion of training data. A compression procedure is developed to construct a word transcription dictionary from the acoustic-phonetic labels of sentence utterances. A modified word-pair grammar using context-sensitive grammatical parts is incorporated to constrain task difficulty. The Viterbi beam search is used for decoding. The segmental K-means algorithm is implemented as a baseline for evaluating the bottom-up merging technique. The system has been evaluated on the TIMIT database (1990) for a vocabulary size of 853. For test set perplexities of 24, 104, and 853, the decoding word accuracies are 90.9%, 86.0%, and 62.9%, respectively. For the perplexity of 104, the decoding accuracy achieved by using the merging algorithm is 4.1% higher than that using the segmental K-means (22.8% error reduction), and the decoding accuracy using the compressed dictionary is 3.0% higher than that using a standard dictionary (18.1% error reduction)
Keywords :
context-sensitive grammars; decoding; hidden Markov models; parameter estimation; speech recognition; HMM; TIMIT database; Viterbi beam search; acoustic-phonetic labels; bottom-up merging algorithm; compression procedure; context-sensitive grammatical parts; continuous mixture Gaussian density; decoding; hidden Markov model; large vocabulary; modified word-pair grammar; parameter estimation; perplexity; phoneme-sized acoustic units; segmental K-means algorithm; sentence utterances; speaker-independent continuous speech recognition; word transcription dictionary; Acoustic beams; Decoding; Dictionaries; Hidden Markov models; Merging; Parameter estimation; Speech recognition; Training data; Viterbi algorithm; Vocabulary;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on