DocumentCode
394345
Title
Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM - MAP decoding and evaluation
Author
Seide, Frank ; Zhou, Jian-lai ; Deng, Li
Author_Institution
5F Beijing Sigma Center, Microsoft Res. Asia, Beijing, China
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
The hidden dynamic model (HDM) has been an attractive acoustic modeling approach because it provides a computational model for coarticulation and the dynamics of human speech. However, the lack of a direct decoding algorithm has been a barrier to research progress on HDM. We have developed a new HDM-based acoustic model, the hidden-trajectory HMM (HTHMM), which combines the state/mixture topology of a traditional monophone HMM with a target-directed hidden-trajectory model (a special form of HDM) for coarticulation modeling. Because the classical Viterbi algorithm is not admissible, we have developed a novel MAP decoding algorithm for HTHMM that correctly takes the hidden continuous trajectory into account. This paper introduces our new HTHMM decoder that allows us for the first time to evaluate an HDM-type model by direct decoding instead of N-best rescoring. Using direct decoding, we demonstrate that the coarticulatory mechanism of our HTHMM matches traditional context-dependent modeling (enumeration of model parameters): The context-independent HTHMM has slightly better accuracy than a crossword-triphone HMM on the Aurora2 task. The decoder also enables us to include state-boundary optimization into the HDM/HTHMM training procedure. This paper presents the detailed decoding algorithm and evaluation results, while in Zhou et al. (2003) we present the HTHMM model itself and parameter training.
Keywords
hidden Markov models; maximum likelihood decoding; speech processing; speech recognition; Aurora2 task; HDM; HMM; HTHMM; MAP decoding; acoustic modeling; coarticulation modeling; direct decoding; embedding; hidden continuous trajectory; hidden dynamic model; hidden-trajectory HMM; monophone HMM; state-boundary optimization; state/mixture topology; target-directed hidden trajectory model; training procedure; Asia; Computational modeling; Context modeling; Decoding; Hidden Markov models; Humans; Speech analysis; Topology; Trajectory; Viterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198889
Filename
1198889
Link To Document