DocumentCode
706845
Title
Fusion of audio and video data by neural networks for robust vowel recognition
Author
Kroschel, K. ; Mekhaiel, M.S. ; Berthommier, F.
Author_Institution
Univ. Karlsruhe (Tech. Hochschule), Karlsruhe, Germany
fYear
1999
fDate
Aug. 31 1999-Sept. 3 1999
Firstpage
3029
Lastpage
3034
Abstract
The performance of speech recognition systems decreases dramatically in noisy environments. A robust human-computer interaction system should therefore make use of both, acoustic and visual signals. In this paper we present an automatic vowel recognition system which can perform in quasi real-time. We focus mainly on the recognition of 5 different German vowels (a, e, i, o, u) and their corresponding visemes (images). First the position of the continuously moving face is determined. The speech parameters of the spoken vowel along with the model parameters of the lip´s image are fed to a neural network to recognize the uttered vowel. The face tracking is shape-independent and hence no special requirements concerning the color or shape of the face are needed.
Keywords
audio signal processing; face recognition; human computer interaction; natural language processing; neural nets; speech recognition; video signal processing; German vowel recognition; German vowel visemes; audio data fusion; automatic vowel recognition system; continuously moving face; face color; face shape; lip image; neural networks; robust human-computer interaction system; robust vowel recognition; speech parameters; speech recognition systems; video data fusion; data fusion; lipreading; neural networks; vowel recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Control Conference (ECC), 1999 European
Conference_Location
Karlsruhe
Print_ISBN
978-3-9524173-5-5
Type
conf
Filename
7099790
Link To Document