Title :
Audio-visual speech modeling using coupled hidden Markov models
Author :
Chu, Stephen M. ; Huang, Thomas S.
Author_Institution :
Beckman Institute and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA
Abstract :
In this work we consider the bimodal fusion problem in audio-visual speech recognition. A novel sensory fusion architecture based on the coupled hidden Markov models (CHMMs) is presented. CHMMs are directed graphical models of stochastic processes and are a special type of dynamic Bayesian networks. The proposed fusion architecture allows us to address the statistical modeling and the fusion of audio-visual speech in a unified framework. Furthermore, the architecture is capable of capturing the asynchronous and temporal inter-modal dependencies between the two information channels. We describe a model transformation strategy to facilitate inference and learning in CHMMs. Results from audio-visual speech recognition experiments confirmed the superior capability of the proposed fusion architecture.
Keywords :
Fuses; Hidden Markov models; Humans; Lead; Robustness; Speech; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.2002.5745026