• DocumentCode
    695556
  • Title

    Multipose audio-visual speech recognition

  • Author

    Estellers, Virginia ; Thiran, Jean-Philippe

  • Author_Institution
    Signal Process. Lab. LTS5, Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
  • fYear
    2011
  • fDate
    Aug. 29 2011-Sept. 2 2011
  • Firstpage
    1065
  • Lastpage
    1069
  • Abstract
    In this paper we study the adaptation of visual and audio-visual speech recognition systems to non-ideal visual conditions. We focus on the effects of a changing pose of the speaker relative to the camera, a problem encountered in natural situations. To that purpose, we introduce a pose normalization technique and perform speech recognition from multiple views by generating virtual frontal views from non-frontal images. The proposed method is inspired by pose-invariant face recognition studies and relies on linear regression to find an approximate mapping between images from different poses. Lipreading experiments quantify the loss of performance related to pose changes and the proposed pose normalization techniques, while audio-visual results analyse how an audio-visual system should account for non-frontal poses in terms of the weight assigned to the visual modality in the audio-visual classifier.
  • Keywords
    audio-visual systems; face recognition; speech recognition; approximate mapping; audio-visual classifier; audio-visual system; multipose audio-visual speech recognition; pose normalization technique; pose normalization techniques; pose-invariant face recognition; visual modality; Discrete cosine transforms; Feature extraction; Mouth; Speech; Speech recognition; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2011 19th European
  • Conference_Location
    Barcelona
  • ISSN
    2076-1465
  • Type

    conf

  • Filename
    7073867