DocumentCode :
3401322
Title :
Models for audiovisual fusion in a noisy-vowel recognition task
Author :
Teissier, P. ; Schwartz, Jean-Luc ; Guérin-Dugué, Anne
Author_Institution :
ICP & LTIRF, INPG, Grenoble, France
fYear :
1997
fDate :
23-25 Jun 1997
Firstpage :
37
Lastpage :
44
Abstract :
This paper presents a comparison of four basic architectures dealing with audiovisual speech in a noisy-vowel recognition task. Provided contextual input (signal-to-noise ratio), three of the four architectures respect the “synergy” criterion which means that audiovisual (AV) recognition is better than audio-alone (A) or visual-alone (V) recognition, both in global terms and for each individual phonetic feature. Without contextual input, the performances collapse, but we propose for one model an original approach using an efficient non-linear data processing which leads to more simple algorithms and increases performances of the audiovisual fusion operator
Keywords :
audio-visual systems; sensor fusion; speech recognition; audiovisual fusion; audiovisual recognition; audiovisual speech; noisy-vowel recognition; phonetic feature; Acoustic noise; Automatic speech recognition; Context modeling; Gaussian noise; Humans; Lips; Sensor fusion; Signal to noise ratio; Speech recognition; Tongue;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia Signal Processing, 1997., IEEE First Workshop on
Conference_Location :
Princeton, NJ
Print_ISBN :
0-7803-3780-8
Type :
conf
DOI :
10.1109/MMSP.1997.602610
Filename :
602610
Link To Document :
بازگشت