Intensity/location normalization for automatic lipreading

Author

Tanaka, Akiji ; Vanegas, Oscar ; Tokuda, Keiichi ; Kitamura, Tadashi

Author_Institution

Dept. of Comput. Sci., Nagoya Inst. of Technol., Japan

Volume

2

fYear

1998

fDate

1998

Firstpage

920

Abstract

This paper describes intensity and location normalization for the improvement of the performance of a speech recognition system by using the visual information in bimodal speech recognition. In conventional speech recognition, many methods have been proposed for normalization of channel characteristics and speaker individuality. In this study, two methods similar to CMN and SAT were proposed for the intensity and location normalization respectively, and two kinds of feature vectors, a subsampled image and a 2D-DCT were compared. Experimental results show that the recognition rates have been very much improved by the normalization techniques

Keywords

discrete cosine transforms; feature extraction; image recognition; image sampling; image sequences; speech recognition; 2D-DCT; automatic lipreading; bimodal speech recognition; channel characteristic normalization; feature vectors; image sequences; intensity/location normalization; recognition rate; speaker individuality; speech recognition system; subsampled image; visual information; Cepstral analysis; Character recognition; Computer science; Discrete cosine transforms; Error analysis; Image recognition; Image sequences; Lips; Robustness; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Proceedings, 1998. ICSP '98. 1998 Fourth International Conference on

Conference_Location

Beijing

Print_ISBN

0-7803-4325-5

Type

conf

DOI

10.1109/ICOSP.1998.770762

Filename

770762