DocumentCode
3477191
Title
A hybrid visual feature extraction method for audio-visual speech recognition
Author
Wu, Guanyong ; Zhu, Jie ; Xu, Haihua
Author_Institution
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear
2009
fDate
7-10 Nov. 2009
Firstpage
1829
Lastpage
1832
Abstract
In this paper, a hybrid visual feature extraction method that combines the extended locally linear embedding (LLE) with visemic linear discriminant analysis (LDA) was presented for the audio-visual speech recognition (AVSR). Firstly the extended LLE is presented to reduce the dimension of the mouth images, which constrains the scope of finding mouth data neighborhood to the corresponding individual´s dataset instead of the whole dataset, and then maps the high dimensional mouth image matrices into a low-dimensional Euclidean space. Secondly we project the feature vectors on the visemic linear discriminant space to find the optimal classification. Finally, in the audio-visual fusion period, the minimum classification error (MCE) training based on the segmental generalized probabilistic descent (GPD) is applied to audio and visual stream weights optimization. Experimental results conducted the CUAVE database show that the proposed method achieves a significant performance than that of the classical PCA and LDA based method in visual-only speech recognition. Further experimental results show the robustness of the MCE based discriminative training method in noisy environment.
Keywords
audio-visual systems; feature extraction; matrix algebra; principal component analysis; probability; signal classification; speech recognition; vectors; CUAVE database; Euclidean space; LDA; MCE based discriminative training method; PCA; audio-visual fusion period; audio-visual speech recognition; feature vectors; hybrid visual feature extraction method; locally linear embedding; minimum classification error training; mouth data neighborhood; mouth image matrices; mouth images; noisy environment; optimal classification; segmental generalized probabilistic descent; visemic linear discriminant analysis; visemic linear discriminant space; visual-only speech recognition; Feature extraction; Image segmentation; Linear discriminant analysis; Mouth; Principal component analysis; Spatial databases; Speech recognition; Streaming media; Vectors; Visual databases; audiovisual speech recognition (AVSR); locally linear embedding (LLE); minimum classification error (MCE);
fLanguage
English
Publisher
ieee
Conference_Titel
Image Processing (ICIP), 2009 16th IEEE International Conference on
Conference_Location
Cairo
ISSN
1522-4880
Print_ISBN
978-1-4244-5653-6
Electronic_ISBN
1522-4880
Type
conf
DOI
10.1109/ICIP.2009.5413573
Filename
5413573
Link To Document