A visual front-end for a continuous pose-invariant lipreading system

Author

Lucey, Patrick ; Sridharan, Sridha

Author_Institution

Image & Video Technol. Lab., Queensland Univ. of Technol., Brisbane, QLD

fYear

2008

fDate

15-17 Dec. 2008

Firstpage

1

Lastpage

6

Abstract

Having an audio-visual automatic speech recognition (AVASR) system which can recognise what a speaker´s says regardless of head position (i.e. left profile, front, right profile etc.), would be most useful as it enables this technology to be used in a host of realistic applications such as mobile phone and in-vehicle speech recognition. A major hurdle in achieving this goal is in developing a visual front-end which can effectively locate and track a user´s face and facial features from a single camera. In this paper, we describe a visual front-end which incorporates a pose-estimator in conjunction with a parallel series of pose specific face and facial feature classifier based on a boosted cascade of simple classifiers devised by Viola and Jones [6]. Results of our visual front-end are tested on the CUAVE database. We also give lipreading results on the CUAVE database, which shows that AVASR whilst a speaker is moving their head is indeed achievable.

Keywords

audio-visual systems; face recognition; speech recognition; audio-visual automatic speech recognition system; continuous pose-invariant lipreading system; facial feature classifier; in-vehicle speech recognition; mobile phone; pose-estimator; single camera; visual front-end; Automatic speech recognition; Cameras; Facial features; Head; Mobile handsets; Mouth; Spatial databases; Speech recognition; Testing; Visual databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing and Communication Systems, 2008. ICSPCS 2008. 2nd International Conference on

Conference_Location

Gold Coast, QLD

Print_ISBN

978-1-4244-4243-0

Electronic_ISBN

978-1-4244-4243-0

Type

conf

DOI

10.1109/ICSPCS.2008.4813664

Filename

4813664