Asynchronous integration of audio and visual sources in bi-modal automatic speech recognition

Author

Deleglise, Paul ; Rogozan, Alexandrina ; Alissali, Mamoun

Author_Institution

LIUM, University of Maine, Av. Olivier Messiaen, BP 535, 72017 Le Mans Cedex, France

fYear

1996

fDate

10-13 Sept. 1996

Firstpage

Lastpage

Abstract

This paper presents our work on the integration of visual data in automatic speech recognition systems. We particularly aim at solving two problems: • classifiation differences for the modeling of acoustic information (phonemes) and visual information (visemes); • the phenomena of anticipation and retention of visemes on the corresponding phonemes. We developed and tested three systems, each dealing with one or both problems and proposing a different integration strategy. The comparison of system performances show that some of the solutions we propose give satisfactory results, and suggest that further work on some others would lead to more performance improvement.

Keywords

Acoustics; Hidden Markov models; Noise; Shape; Speech; Speech recognition; Visualization;

fLanguage

English

Publisher

ieee

Conference_Titel

European Signal Processing Conference, 1996. EUSIPCO 1996. 8th

Conference_Location

Trieste, Italy

Print_ISBN

978-888-6179-83-6

Type

conf

Filename

7083212

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=701486