DocumentCode :
3157886
Title :
A Robust Visual Feature Extraction based BTSM-LDA for Audio-Visual Speech Recognition
Author :
Lv, Guoyun ; Zhao, Rongchun ; Jiang, Dongmei ; Li, Yan ; Sahli, H.
Author_Institution :
Northwestern Polytech. Univ., Xi´´an
fYear :
2007
fDate :
22-24 Aug. 2007
Firstpage :
637
Lastpage :
641
Abstract :
The asynchrony for speech and lip movement is key problem of audio-visual speech recognition (AVSR) system. A multi-stream asynchrony dynamic Bayesian network (MS-ADBN) model is proposed for audio-visual speech recognition. Comparing with multi-stream HMM (MSHMM), MS-ADBN model describes the asynchrony of audio stream and visual stream to the word level. Simultaneously, based on profile of lip implemented by using Bayesian tangent shape model (BTSM), linear discrimination analysis (LDA) is used for visual feature extraction which describes the dynamic feature of lip and removes the redundancy of lip geometrical feature. The experiments results on continuous digit audio-visual database show that lip dynamic feature based on BTSM and LDA is more stable and robust than direct lip geometrical feature. In the noisy environments with signal to noise ratios ranging from 0 dB to 30 dB, comparing with MSHMM, MS-ADBN model with MFCC and LDA visual features has an average improvement of 4.92% in speech recognition rate.
Keywords :
Bayes methods; feature extraction; hidden Markov models; speech recognition; Bayesian tangent shape model; audio-visual speech recognition; linear discrimination analysis; multi-stream asynchrony dynamic Bayesian network; multi-stream hidden Markov model; robust visual feature extraction; Bayesian methods; Feature extraction; Hidden Markov models; Linear discriminant analysis; Robustness; Shape; Signal to noise ratio; Solid modeling; Speech recognition; Streaming media; Bayesian Tangent Shape Model; Dynamic Bayesian Networks; audio-visual; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications and Networking in China, 2007. CHINACOM '07. Second International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-1009-5
Electronic_ISBN :
978-1-4244-1009-5
Type :
conf
DOI :
10.1109/CHINACOM.2007.4469472
Filename :
4469472
Link To Document :
بازگشت