DocumentCode :
1884160
Title :
A hybrid approach to bimodal speech recognition
Author :
Bregler, Christoph ; Omohundro, Stephen M. ; Konig, Yochai
Author_Institution :
Div. of Comput. Sci., California Univ., Berkeley, CA, USA
Volume :
1
fYear :
1994
fDate :
31 Oct-2 Nov 1994
Firstpage :
556
Abstract :
We explore multimodal recognition by combining visual lipreading with acoustic speech recognition. We show that combining visual and acoustic speech information improves the recognition performance significantly, especially in noisy environments. This is achieved with a hybrid speech recognition architecture, consisting of a new visual learning and tracking mechanism, a channel robust acoustic front end, a connectionist phone classifier, and a HMM based sentence classifier. Our focus in this paper is on the visual subsystem based on “surface-learning” and active vision models. Our bimodal hybrid speech recognition system has already been applied to a multi-speaker spelling task, and work is in progress to apply it to a speaker independent spontaneous speech task, the “Berkeley Restaurant Project (BeRP)”
Keywords :
acoustic signal processing; active vision; feedforward neural nets; hidden Markov models; learning (artificial intelligence); multilayer perceptrons; speech recognition; spelling aids; tracking; vision; Berkeley Restaurant Project; HMM based sentence classifier; acoustic speech information; acoustic speech recognition; active vision models; bimodal hybrid speech recognition system; channel robust acoustic front end; connectionist phone classifier; hybrid speech recognition architecture; multi-speaker spelling task; multilayer perceptron; multimodal recognition; noisy environments; recognition performance; speaker independent spontaneous speech task; surface-learning; tracking mechanism; visual information; visual lipreading; visual subsystem; Acoustic distortion; Acoustic noise; Computer science; Crosstalk; Hidden Markov models; Loudspeakers; Speech recognition; Visual databases; Vocabulary; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signals, Systems and Computers, 1994. 1994 Conference Record of the Twenty-Eighth Asilomar Conference on
Conference_Location :
Pacific Grove, CA
ISSN :
1058-6393
Print_ISBN :
0-8186-6405-3
Type :
conf
DOI :
10.1109/ACSSC.1994.471514
Filename :
471514
Link To Document :
بازگشت