DocumentCode
417299
Title
DBN based multi-stream models for audio-visual speech recognition
Author
Gowdy, John N. ; Subramanya, Amarnag ; Bartels, Chris ; Bilmes, Jeff
Author_Institution
Clemson Univ., SC, USA
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
In this paper, we propose a model based on dynamic Bayesian networks (DBN) to integrate information from multiple audio and visual streams. We also compare the DBN based system (implemented using the Graphical Model Toolkit (GMTK)) with a classical HMM (implemented in the Hidden Markov Model Toolkit (HTK)) for both the single and two stream integration problems. We also propose a new model (mixed integration) to integrate information from three or more streams derived from different modalities and compare the new model´s performance with that of a synchronous integration scheme. A new technique to estimate stream confidence measures for the integration of three or more streams is also developed and implemented. Results from our implementation using the Clemson University Audio Visual Experiments (CUAVE) database indicate an absolute improvement of about 4% in word accuracy in the -4 to 10db average case when making use of two audio and one video streams for the mixed integration models over the sychronous models.
Keywords
Bayes methods; belief networks; speech recognition; CUAVE database; Clemson University Audio Visual Experiments database; DBN; GMTK; Graphical Model Toolkit; HMM; HTK; Hidden Markov Model Toolkit; audio-visual speech recognition; dynamic Bayesian networks; multi-stream models; performance; stream confidence measure estimation; Active noise reduction; Audio databases; Bayesian methods; Graphical models; Hidden Markov models; Random variables; Speech recognition; Streaming media; Visual databases; Working environment noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326155
Filename
1326155
Link To Document