DocumentCode
2704401
Title
Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition
Author
Marcheret, E. ; Libal, V. ; Potamianos, Gerasimos
Author_Institution
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume
4
fYear
2007
fDate
15-20 April 2007
Abstract
To generate optimal multi-stream audio-visual speech recognition performance, appropriate dynamic weighting of each modality is desired. In this paper, we propose to estimate such weights based on a combination of acoustic signal space observations and single-modality audio and visual speech model likelihoods. Two modeling approaches are investigated for such weight estimation: one based on a sigmoid fitting function, the other employing Gaussian mixture models. Reported experiments demonstrate that the later approach outperforms sigmoid based modeling, and is dramatically superior to the static weighting scheme.
Keywords
Gaussian processes; audio-visual systems; speech processing; speech recognition; Gaussian mixture models; acoustic signal space observations; dynamic stream weight modeling; optimal multistream audio-visual speech recognition; sigmoid fitting function; single-modality audio; static weighting scheme; Automatic speech recognition; Fuses; Hidden Markov models; Linear discriminant analysis; Robustness; Speech processing; Speech recognition; Streaming media; Table lookup; Testing; Audio-Visual Speech Recognition; Multi-Modal Fusion; Multi-Stream HMM; Speech Processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location
Honolulu, HI
ISSN
1520-6149
Print_ISBN
1-4244-0727-3
Type
conf
DOI
10.1109/ICASSP.2007.367227
Filename
4218258
Link To Document