Title :
A Multi-Stream Approach to Audiovisual Automatic Speech Recognition
Author :
Hasegawa-Johnson, Mark
Author_Institution :
Univ. of Illinois, Urbana
Abstract :
This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel´s dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least one dorsal stream that maps from acoustics to articulation. Our implementation represents each of these streams by a dynamic Bayesian network; disagreements between the three streams are resolved using a voting scheme. The proposed algorithm was tested using the CUAVE audiovisual speech corpus. Results indicate that the ventral stream model tends to make fewer mistakes in the labeling of vowels, while the dorsal stream model tends to make fewer mistakes in the labeling of consonants; the recognizer voting scheme takes advantage of these differences to reduce overall word error rate.
Keywords :
audio-visual systems; belief networks; semantic networks; speech recognition; audiovisual automatic speech recognition; dual-stream model; dynamic Bayesian network; human speech processing; multistream approach; parallel neural streams; semantic networks; word error rate; Acoustic testing; Automatic speech recognition; Bayesian methods; Error analysis; Humans; Labeling; Speech processing; Speech recognition; Streaming media; Voting;
Conference_Titel :
Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on
Conference_Location :
Crete
Print_ISBN :
978-1-4244-1274-7
DOI :
10.1109/MMSP.2007.4412884