Title :
Real-time lip tracking and bimodal continuous speech recognition
Author :
Chan, Michael T. ; Zhang, You ; Huang, Thomas S.
Author_Institution :
Rockwell Sci. Center, Thousand Oaks, CA, USA
Abstract :
We investigate using a bimodal approach to speech recognition by incorporating additional visual features derived from lip movement of the speaker. A reference contour model is used to track the lip outline of the speaker. By using color, constraining the deformation in an affine subspace, and by incorporating an outlier rejection mechanism, our system is robust and runs in real time. To address the model initialization issue, a fast lip localization algorithm is also incorporated. A sample of continuous bimodal speech data based on a confined vocabulary (useful for our application area) was synchronously captured for training and testing. Using the hidden Markov modeling framework, we trained our bimodal context-dependent sub-word-based recognizer in a few different ways. The experiments show that the bimodal recognizer compares favorably to the acoustic-only counterpart. The results also indicate that it is advantageous to include first derivatives of the visual features. Furthermore, the 2-stream modeling scheme appears to be preferable to the 1-stream case for bimodal speech
Keywords :
edge detection; feature extraction; hidden Markov models; image colour analysis; image motion analysis; real-time systems; speech recognition; tracking; 1-stream; 2-stream modeling; affine subspace; bimodal continuous speech recognition; color; confined vocabulary; context-dependent sub-word-based recognizer; continuous bimodal speech data; experiments; fast lip localization algorithm; first derivatives; hidden Markov modeling; model initialization; modeling; outlier rejection mechanism; real time system; real-time lip tracking; reference contour model; visual features; Acoustic noise; Automatic speech recognition; Context modeling; Hidden Markov models; Lips; Speech recognition; Subspace constraints; Training data; Vocabulary; Working environment noise;
Conference_Titel :
Multimedia Signal Processing, 1998 IEEE Second Workshop on
Conference_Location :
Redondo Beach, CA
Print_ISBN :
0-7803-4919-9
DOI :
10.1109/MMSP.1998.738914