Title :
Improving connected letter recognition by lipreading
Author :
Bregler, Christoph ; Hild, Hermunn ; Manke, Stefan ; Waibel, Alex
Author_Institution :
Dept. of Comput. Sci., Karlsruhe Univ., Germany
Abstract :
The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition.<>
Keywords :
audio acoustics; audio-visual systems; errors; neural nets; speech recognition; acoustic-visual hypotheses; automated speech perception; connected letter recognition; dynamic time warping algorithm; error rate; lipreading; modular multistage time delay neural network; speech recognition system; speech-reading;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location :
Minneapolis, MN, USA
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.1993.319179