• DocumentCode
    706845
  • Title

    Fusion of audio and video data by neural networks for robust vowel recognition

  • Author

    Kroschel, K. ; Mekhaiel, M.S. ; Berthommier, F.

  • Author_Institution
    Univ. Karlsruhe (Tech. Hochschule), Karlsruhe, Germany
  • fYear
    1999
  • fDate
    Aug. 31 1999-Sept. 3 1999
  • Firstpage
    3029
  • Lastpage
    3034
  • Abstract
    The performance of speech recognition systems decreases dramatically in noisy environments. A robust human-computer interaction system should therefore make use of both, acoustic and visual signals. In this paper we present an automatic vowel recognition system which can perform in quasi real-time. We focus mainly on the recognition of 5 different German vowels (a, e, i, o, u) and their corresponding visemes (images). First the position of the continuously moving face is determined. The speech parameters of the spoken vowel along with the model parameters of the lip´s image are fed to a neural network to recognize the uttered vowel. The face tracking is shape-independent and hence no special requirements concerning the color or shape of the face are needed.
  • Keywords
    audio signal processing; face recognition; human computer interaction; natural language processing; neural nets; speech recognition; video signal processing; German vowel recognition; German vowel visemes; audio data fusion; automatic vowel recognition system; continuously moving face; face color; face shape; lip image; neural networks; robust human-computer interaction system; robust vowel recognition; speech parameters; speech recognition systems; video data fusion; data fusion; lipreading; neural networks; vowel recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control Conference (ECC), 1999 European
  • Conference_Location
    Karlsruhe
  • Print_ISBN
    978-3-9524173-5-5
  • Type

    conf

  • Filename
    7099790