Title :
Models for audiovisual fusion in a noisy-vowel recognition task
Author :
Teissier, P. ; Schwartz, Jean-Luc ; Guérin-Dugué, Anne
Author_Institution :
ICP & LTIRF, INPG, Grenoble, France
Abstract :
This paper presents a comparison of four basic architectures dealing with audiovisual speech in a noisy-vowel recognition task. Provided contextual input (signal-to-noise ratio), three of the four architectures respect the “synergy” criterion which means that audiovisual (AV) recognition is better than audio-alone (A) or visual-alone (V) recognition, both in global terms and for each individual phonetic feature. Without contextual input, the performances collapse, but we propose for one model an original approach using an efficient non-linear data processing which leads to more simple algorithms and increases performances of the audiovisual fusion operator
Keywords :
audio-visual systems; sensor fusion; speech recognition; audiovisual fusion; audiovisual recognition; audiovisual speech; noisy-vowel recognition; phonetic feature; Acoustic noise; Automatic speech recognition; Context modeling; Gaussian noise; Humans; Lips; Sensor fusion; Signal to noise ratio; Speech recognition; Tongue;
Conference_Titel :
Multimedia Signal Processing, 1997., IEEE First Workshop on
Conference_Location :
Princeton, NJ
Print_ISBN :
0-7803-3780-8
DOI :
10.1109/MMSP.1997.602610