Audio-visual speech recognition in a Portuguese language based application

Author

Pera, Vitor ; Sá, Filipe ; Afonso, Pedro ; Ferreira, Ricardo

Author_Institution

Fac. of Eng., Porto Univ., Portugal

Volume

2

fYear

2003

fDate

10-12 Dec. 2003

Firstpage

688

Abstract

We present in this article experimental results obtained with an automatic speech recogniser developed for a speaker dependent and continuous speech alphanumeric recognition application based on the European Portuguese language. An audio-visual speech recognition approach was followed to design and build this system. Besides the well known complementary between the acoustic and the visual information for speech recognition purposes, the visual features are obviously immune to any acoustic disturbance, thus making the system more robust in acoustically contaminated environments. The results presented clearly show that the inclusion of a video stream, using a multi-stream decoding formalism, decreases the word error rate in approximately 56%_rel over a wide range of acoustical signal-noise ratio.

Keywords

acoustic signal processing; audio signal processing; decoding; feature extraction; natural languages; speech recognition; video signal processing; Portuguese language; acoustic information; acoustical signal to noise ratio; audio visual speech recognition; automatic speech recogniser; continuous speech alphanumeric recognition; multistream decoding; speaker dependent recognition; video stream; visual information; word error rate; Audio databases; Automatic speech recognition; Decoding; Natural languages; Particle separators; Robustness; Speech recognition; Streaming media; Video compression; Visual databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Industrial Technology, 2003 IEEE International Conference on

Print_ISBN

0-7803-7852-0

Type

conf

DOI

10.1109/ICIT.2003.1290738

Filename

1290738