Scale based features for audiovisual speech recognition

Author

Matthews, I.A. ; Bangham, J.A. ; Cox, S.J.

Author_Institution

Sch. of Inf. Syst., East Anglia Univ., Norwich, UK

fYear

1996

fDate

35397

Firstpage

42583

Lastpage

42589

Abstract

This paper demonstrates the use of nonlinear image decomposition, in the form of a sieve, applied to the task of audiovisual speech recognition of a database of the letters A-Z for ten talkers. A scale based feature vector is formed directly from the grayscale pixels of an image containing the talkers mouth on a per frame basis. This is independent of image amplitude and position information and neither accurate tracking or special markers are required. Results are presented for audio only, visual only and for early and late integrated audiovisual cases

Keywords

audio-visual systems; audiovisual speech recognition; database; feature vector; grayscale pixels; image amplitude; nonlinear image decomposition; scale based features; sieve; tracking;

fLanguage

English

Publisher

iet

Conference_Titel

Integrated Audio-Visual Processing for Recognition, Synthesis and Communication (Digest No: 1996/213), IEE Colloquium on

Conference_Location

London

Type

conf

DOI

10.1049/ic:19961152

Filename

645684

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3280694