DocumentCode :
706845
Title :
Fusion of audio and video data by neural networks for robust vowel recognition
Author :
Kroschel, K. ; Mekhaiel, M.S. ; Berthommier, F.
Author_Institution :
Univ. Karlsruhe (Tech. Hochschule), Karlsruhe, Germany
fYear :
1999
fDate :
Aug. 31 1999-Sept. 3 1999
Firstpage :
3029
Lastpage :
3034
Abstract :
The performance of speech recognition systems decreases dramatically in noisy environments. A robust human-computer interaction system should therefore make use of both, acoustic and visual signals. In this paper we present an automatic vowel recognition system which can perform in quasi real-time. We focus mainly on the recognition of 5 different German vowels (a, e, i, o, u) and their corresponding visemes (images). First the position of the continuously moving face is determined. The speech parameters of the spoken vowel along with the model parameters of the lip´s image are fed to a neural network to recognize the uttered vowel. The face tracking is shape-independent and hence no special requirements concerning the color or shape of the face are needed.
Keywords :
audio signal processing; face recognition; human computer interaction; natural language processing; neural nets; speech recognition; video signal processing; German vowel recognition; German vowel visemes; audio data fusion; automatic vowel recognition system; continuously moving face; face color; face shape; lip image; neural networks; robust human-computer interaction system; robust vowel recognition; speech parameters; speech recognition systems; video data fusion; data fusion; lipreading; neural networks; vowel recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control Conference (ECC), 1999 European
Conference_Location :
Karlsruhe
Print_ISBN :
978-3-9524173-5-5
Type :
conf
Filename :
7099790
Link To Document :
بازگشت