DocumentCode
2018595
Title
Improving connected letter recognition by lipreading
Author
Bregler, Christoph ; Hild, Hermunn ; Manke, Stefan ; Waibel, Alex
Author_Institution
Dept. of Comput. Sci., Karlsruhe Univ., Germany
Volume
1
fYear
1993
fDate
27-30 April 1993
Firstpage
557
Abstract
The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition.<>
Keywords
audio acoustics; audio-visual systems; errors; neural nets; speech recognition; acoustic-visual hypotheses; automated speech perception; connected letter recognition; dynamic time warping algorithm; error rate; lipreading; modular multistage time delay neural network; speech recognition system; speech-reading;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location
Minneapolis, MN, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.1993.319179
Filename
319179
Link To Document