DocumentCode :
1858343
Title :
An asynchronous DBN for audio-visual speech recognition
Author :
Saenko, K. ; Livescu, K.
Author_Institution :
Comput. Sci. & Artificial Intell. Lab., Massachusetts Inst. of Technol., Cambridge, MA
fYear :
2006
fDate :
10-13 Dec. 2006
Firstpage :
154
Lastpage :
157
Abstract :
We investigate an asynchronous two-stream dynamic Bayesian network-based model for audio-visual speech recognition. The model allows the audio and visual streams to de-synchronize within the boundaries of each word. The probability of de-synchronization by a given number of states is learned during training. This type of asynchrony has been previously used for pronunciation modeling and for visual speech recognition (lipreading); however, this is its first application to audiovisual speech recognition. We evaluate the model on an audiovisual corpus of English digits (CUAVE) with different levels of added acoustic noise, and compare it to several baselines. The asynchronous model outperforms audio-only and synchronous audio-visual baselines. We also compare models with different degrees of allowed asynchrony and find that the lowest error rate on this task is achieved when the audio and visual streams are allowed to de-synchronize by up to two states.
Keywords :
audio signal processing; belief networks; speech recognition; English digits; asynchronous DBN; asynchronous two-stream dynamic Bayesian network-based model; audio-visual corpus; audio-visual speech recognition; lipreading; pronunciation modeling; Acoustic noise; Artificial intelligence; Automatic speech recognition; Bayesian methods; Computer science; Feature extraction; Hidden Markov models; Laboratories; Speech recognition; Streaming media;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2006. IEEE
Conference_Location :
Palm Beach
Print_ISBN :
1-4244-0872-5
Type :
conf
DOI :
10.1109/SLT.2006.326841
Filename :
4123385
Link To Document :
بازگشت