Title :
Unified frame and segment based models for automatic speech recognition
Author :
Hon, Hsiao-Wuen ; Wang, Kuansan
Author_Institution :
Speech Technol. Group, Microsoft Res., Redmond, WA, USA
Abstract :
In this paper, we propose an analytically tractable framework that integrates the frame and segment based acoustic modeling techniques. We combine the two approaches by jointly modeling their respective hidden Markov processes. Since the joint process is based on the same mathematical framework, conventional search and training techniques, such as Viterbi and EM algorithms, can be directly applied. It also allows the score from either model to contribute to the training and decoding of the other, reaching a jointly optimal decision. We conducted two series of experiments to verify our hypotheses. In the phone-pair classification experiments, our segment models show a 24% error reduction over state-of-the-art HMM-based system. The superior quality of segment models contributes to an 8.2% reduction in word error rates for the unified system on the WSJ dictation task
Keywords :
hidden Markov models; maximum likelihood estimation; pattern classification; speech recognition; EM algorithms; HMM; Viterbi algorithms; WSJ dictation task; acoustic modeling techniques; analytically tractable framework; automatic speech recognition; frame based models; hidden Markov processes; jointly optimal decision; mixture trajectory models; phone-pair classification; search techniques; segment based models; training techniques; unified system; word error rates; Automatic speech recognition; Decoding; Error analysis; Hidden Markov models; Pattern recognition; Samarium; Speech analysis; Speech recognition; Viterbi algorithm; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.859135