• DocumentCode
    2961980
  • Title

    Audio-visual speech synchronization detection using a bimodal linear prediction model

  • Author

    Kumar, Kush ; Navratil, J. ; Marcheret, E. ; Libal, V. ; Ramaswamy, Ganesh ; Potamianos, Gerasimos

  • Author_Institution
    Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2009
  • fDate
    20-25 June 2009
  • Firstpage
    53
  • Lastpage
    59
  • Abstract
    In this work, we study the problem of detecting audio-visual (AV) synchronization in video segments containing a speaker in frontal head pose. The problem holds important applications in biometrics, for example spoofing detection, and it constitutes an important step in AV segmentation necessary for deriving AV fingerprints in multimodal speaker recognition. To attack the problem, we propose a time-evolution model for AV features and derive an analytical approach to capture the notion of synchronization between them. We report results on an appropriate AV database, using two types of visual features extracted from the speaker´s facial area: geometric ones and features based on the discrete cosine image transform. Our results demonstrate that the proposed approach provides substantially better AV synchrony detection over a baseline method that employs mutual information, with the geometric visual features outperforming the image transform ones.
  • Keywords
    audio databases; audio-visual systems; biometrics (access control); discrete cosine transforms; face recognition; feature extraction; image segmentation; signal detection; speaker recognition; synchronisation; video signal processing; AV database; audio-visual speech synchronization detection; bimodal linear prediction model; biometrics; discrete cosine image transform; geometric visual feature; speaker facial area; speaker frontal head pose; time-evolution model; video segment; visual feature extraction; Biometrics; Discrete transforms; Fingerprint recognition; Head; Image databases; Predictive models; Spatial databases; Speaker recognition; Speech; Visual databases; Audio-Visual Synchronization; Linear Prediction; Mutual Information; Visual Features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    2160-7508
  • Print_ISBN
    978-1-4244-3994-2
  • Type

    conf

  • DOI
    10.1109/CVPRW.2009.5204303
  • Filename
    5204303