DocumentCode :
382026
Title :
Audio-visual continuous speech recognition using MPEG-4 compliant visual features
Author :
Aleksic, Petar S. ; Williams, JayJ ; Wu, Zhiiin ; Katsaggelos, Aggelos K.
Author_Institution :
Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA
Volume :
1
fYear :
2002
fDate :
2002
Abstract :
We utilize facial animation parameters (FAPs), supported by the MPEG-4 standard for the visual representation of speech, in order to improve automatic speech recognition (ASR) significantly. We describe a robust and automatic algorithm for extraction of FAPs from visual data that requires no hand labeling or extensive training procedures. Multi-stream hidden Markov models (HMM) are used to integrate audio and visual information. ASR experiments are performed under both clean and noisy audio conditions using a relatively large vocabulary (approximately 1000 words). The proposed system reduces the word error rate (WER) by 20% to 23% relative to audio-only ASR WERs, at various SNRs with additive white Gaussian noise, and by 19% relative to the audio-only ASR WER under clean audio conditions.
Keywords :
AWGN; acoustic noise; audio signal processing; error statistics; feature extraction; hidden Markov models; speech recognition; video signal processing; visual communication; ASR; AWGN; HMM; MPEG-4 FAP; SNR; additive white Gaussian noise; audio-visual speech recognition; automatic speech recognition; continuous speech recognition; facial animation parameters; hidden Markov models; word error rate; Automatic speech recognition; Data mining; Facial animation; Financial advantage program; Hidden Markov models; Labeling; MPEG 4 Standard; Robustness; Speech recognition; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing. 2002. Proceedings. 2002 International Conference on
ISSN :
1522-4880
Print_ISBN :
0-7803-7622-6
Type :
conf
DOI :
10.1109/ICIP.2002.1038187
Filename :
1038187
Link To Document :
بازگشت