A Dempster-Shafer Based Fusion Approach for Audio-Visual Speech Recognition with Application to Large Vocabulary French Speech

Author

Foucher, S. ; Laliberté, F. ; Boulianne, G. ; Gagnon, L.

Author_Institution

Dept. of R&D, CRIM

Volume

1

fYear

2006

fDate

14-19 May 2006

Abstract

This work explores a new way of fusing audio and visual information for audio-visual automatic speech recognition in the context of a large vocabulary application. Mouth shape information is extracted off-line and integrated into a speech recognition system using a phoneme-based Dempster-Shafer fusion approach. The fusion methodology assumes that the audio information about the phonemes is a precise Bayesian source while the visual information is an imprecise evidential source. This ensures that the visual information does not degrade significantly the audio information in situation where the audio performs well in controlled noiseless environment. Bayesian and simple consonance belief structures are explored and compared, along with standard stack-based fusion

Keywords

Bayes methods; audio signal processing; image recognition; inference mechanisms; sensor fusion; speech processing; speech recognition; Bayesian source; audio-visual automatic speech recognition; consonance belief structures; large vocabulary French speech; mouth shape information; phoneme-based Dempster-Shafer fusion approach; stack-based fusion; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1660091

Filename

1660091