DocumentCode :
2289229
Title :
Speaker independent audio-visual continuous speech recognition
Author :
Liang, Luhong ; Liu, Xiaoxing ; Zhao, Yibao ; Pi, Xiaobo ; Nefian, A.V.
Author_Institution :
Microcomput. Res. Labs., Intel Corp., Santa Clara, CA, USA
Volume :
2
fYear :
2002
fDate :
2002
Firstpage :
25
Abstract :
The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The speaker independent audio-visual continuous speech recognition system presented relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Further, the visual and acoustic observation sequences are integrated using a coupled hidden Markov (CHMM) model. The statistical properties of the CHMM can model the audio and visual state asynchrony while preserving their natural correlation over time. The experimental results show that the current system tested on the XM2VTS database reduces by over 55% the error rate of the audio only speech recognition system at SNR of 0 dB.
Keywords :
acoustic noise; audio-visual systems; feature extraction; hidden Markov models; image sequences; speech recognition; ASR systems; CHMM; XM2VTS database; acoustic noise perturbation; acoustic observation sequences; audio-visual continuous speech recognition; correlation; coupled hidden Markov model; error rate reduction; mouth region detection; mouth region tracking; multimedia applications; robust speech recognition systems; speaker independent continuous speech recognition; speech generation; statistical properties; visual features; visual observation sequences; Acoustic noise; Acoustic signal detection; Acoustic testing; Hidden Markov models; Loudspeakers; Mouth; Multimedia systems; Noise robustness; Speech enhancement; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7803-7304-9
Type :
conf
DOI :
10.1109/ICME.2002.1035365
Filename :
1035365
Link To Document :
بازگشت