DocumentCode :
382027
Title :
Bimodal fusion in audio-visual speech recognition
Author :
Zhang, Xiaozheng ; Mersereau, Russell M. ; Clements, Mark
Author_Institution :
Center for Signal & Image Process., Georgia Inst. of Technol., Atlanta, GA, USA
Volume :
1
fYear :
2002
fDate :
2002
Abstract :
Extending automatic speech recognition (ASR) to the visual modality has been shown to increase recognition accuracy greatly and improve system robustness over purely acoustic systems, especially in acoustically hostile environments. An important aspect of designing such systems is how to incorporate the visual component into the acoustic speech recognizer to achieve optimal performance. We investigate methods of integrating the audio and visual modalities within HMM-based classification models. We examine existing integration schemes and propose the use of a coupled hidden Markov model (CHMM) to exploit audio-visual interaction. Our experimental results demonstrate that the CHMM consistently outperforms other integration models for a large range of acoustic noise levels and suggest that it better captures temporal correlations between the two streams of information.
Keywords :
acoustic noise; audio signal processing; hidden Markov models; speech recognition; video signal processing; visual communication; ASR; HMM; acoustic noise levels; audio-visual speech recognition; automatic speech recognition; bimodal fusion; classification models; coupled hidden Markov model; temporal correlations; Acoustic noise; Automatic speech recognition; Hidden Markov models; Humans; Microphone arrays; Noise robustness; Speech enhancement; Speech recognition; Topology; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing. 2002. Proceedings. 2002 International Conference on
ISSN :
1522-4880
Print_ISBN :
0-7803-7622-6
Type :
conf
DOI :
10.1109/ICIP.2002.1038188
Filename :
1038188
Link To Document :
بازگشت