DocumentCode
695556
Title
Multipose audio-visual speech recognition
Author
Estellers, Virginia ; Thiran, Jean-Philippe
Author_Institution
Signal Process. Lab. LTS5, Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
fYear
2011
fDate
Aug. 29 2011-Sept. 2 2011
Firstpage
1065
Lastpage
1069
Abstract
In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on the effects of a changing pose of the speaker relative to the camera, a problem encountered in natural situations. To that purpose, we introduce a pose normalization technique and perform speech recognition from multiple views by generating virtual frontal views from non-frontal images. The proposed method is inspired by pose-invariant face recognition studies and relies on linear regression to find an approximate mapping between images from different poses. Lipreading experiments quantify the loss of performance related to pose changes and the proposed pose normalization techniques, while audio-visual results analyse how an audio-visual system should account for non-frontal poses in terms of the weight assigned to the visual modality in the audio-visual classifier.
Keywords
audio-visual systems; face recognition; speech recognition; approximate mapping; audio-visual classifier; audio-visual system; multipose audio-visual speech recognition; pose normalization technique; pose normalization techniques; pose-invariant face recognition; visual modality; Discrete cosine transforms; Feature extraction; Mouth; Speech; Speech recognition; Visualization;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing Conference, 2011 19th European
Conference_Location
Barcelona
ISSN
2076-1465
Type
conf
Filename
7073867
Link To Document