DocumentCode :
2381109
Title :
Comparison of MPEG-4 facial animation parameter groups with respect to audio-visual speech recognition performance
Author :
Aleksic, Petar S. ; Katsaggelos, Aggelos K.
Author_Institution :
Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA
Volume :
3
fYear :
2005
fDate :
11-14 Sept. 2005
Abstract :
In this paper, we describe an audio-visual automatic speech recognition (AV-ASR) system that utilizes facial animation parameters (FAPs), supported by the MPEG-4 standard, for the visual representation of speech. We describe the visual feature extraction algorithms used for extracting FAPs, which control outer- and inner-lip movement. Principal component analysis (PCA) is performed on both inner- and outer-lip FAP vector in order to decrease their dimensionality and decorrelate them. The PCA-based projection weights of the extracted FAP vectors are used as visual features. Multi-stream hidden Markov models (HMMs) and a late integration approach are used to integrate audio and visual information and train a continuous AV-ASR system. We compare the performance of the developed AV-ASR system utilizing outer- and inner lip FAPs, individually and jointly. Experiments were performed for different dimensionalities of the visual features, at various SNRs (0-30dB) with additive white Gaussian noise, on a relatively large vocabulary (approximately 1000 words) database. The proposed system reduces the word error rate (WER) by 20% to 23% relatively to audio-only ASR WERs. Conclusions are drawn on the individual and combined effectiveness of the inner- and outer-lip FAPs, the trade off between the dimensionality of the visual features and the amount of speechreading information contained in them and its influence on the AV-ASR performance.
Keywords :
AWGN; computer animation; face recognition; hidden Markov models; image representation; principal component analysis; speech recognition; MPEG-4 facial animation parameter; additive white Gaussian noise; audio-visual speech recognition performance; facial animation parameters; multistream hidden Markov models; principal component analysis; speech representation; visual feature extraction algorithms; visual representation; word error rate; Automatic control; Automatic speech recognition; Decorrelation; Facial animation; Feature extraction; Financial advantage program; Hidden Markov models; MPEG 4 Standard; Principal component analysis; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing, 2005. ICIP 2005. IEEE International Conference on
Print_ISBN :
0-7803-9134-9
Type :
conf
DOI :
10.1109/ICIP.2005.1530438
Filename :
1530438
Link To Document :
بازگشت