DocumentCode :
1995742
Title :
VidTIMIT audio visual phoneme recognition using AAM visual features and human auditory motivated acoustic wavelet features
Author :
Biswas, Astik ; Sahu, P.K. ; Bhowmick, Anirban ; Chandra, Mahesh
Author_Institution :
Dept. of Electr. Eng., Nat. Inst. of Technol., Rourkela, India
fYear :
2015
fDate :
9-11 July 2015
Firstpage :
428
Lastpage :
433
Abstract :
This paper presents an audio visual phoneme recognition system using the shape and appearance information extracted from jaw and lip region to enhance the robustness in noisy environment. Consideration of visual features along with traditional acoustic features have been found to be promising in complex auditory environment. Visual modality can provide complementary information to the speech recognizer when the audio modality is badly affected by background noise. Acoustic modality is represented by auditory based equivalent rectangular bandwidth (ERB) like wavelet features (WERBC) features, whereas visual modality is represented by statistically powerful active appearance model (AAM) based features. Audio and visual modalities are fused by using a proportional weighting factor to form the two stream audio visual synchronous Hidden Markov Model (SHMM) recognizer. The VidTIMIT database is chosen to study the performance of multi-modal phoneme recognition system. Artificial noises are injected to audio files at different SNR levels (0dB-20dB) to study the performance of system in noisy environment. Combination of WERBC and AAM features outperform the well known traditional combination of Mel scale cepstrum coefficients (MFCC) acoustic features and discrete cosine transform (DCT) visual features.
Keywords :
audio signal processing; hidden Markov models; speech recognition; AAM based features; AAM feature; AAM visual features; SHMM recognizer; VidTIMIT audio visual phoneme recognition; VidTIMIT database; WERBC feature; active appearance model based features; artificial noises; audio modality; audio visual phoneme recognition system; auditory based equivalent rectangular bandwidth; human auditory motivated acoustic wavelet features; multimodal phoneme recognition system; proportional weighting factor; speech recognizer; stream audio visual synchronous hidden Markov model; visual modality; Acoustics; Active appearance model; Feature extraction; Hidden Markov models; Shape; Speech; Visualization; AAM; Audio visual phoneme recognition; HCI; Vid-TIMIT Corpus; WERBC;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
Type :
conf
DOI :
10.1109/ReTIS.2015.7232917
Filename :
7232917
Link To Document :
بازگشت