• DocumentCode
    3494759
  • Title

    Overcoming asynchrony in Audio-Visual Speech Recognition

  • Author

    Estellers, Virginia ; Thiran, Jean-Philippe

  • Author_Institution
    Signal Process. Lab., Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
  • fYear
    2010
  • fDate
    4-6 Oct. 2010
  • Firstpage
    466
  • Lastpage
    471
  • Abstract
    In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.
  • Keywords
    Bayes methods; audio signal processing; audio-visual systems; image classification; image recognition; speech recognition; CUAVE database; asynchronous statistical models; audio streams; audio-visual speech recognition; dynamic Bayesian network; feature processing; natural asynchrony; temporal evolution; video streams; Bayesian methods; Complexity theory; Hidden Markov models; Speech; Speech recognition; Visualization; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Signal Processing (MMSP), 2010 IEEE International Workshop on
  • Conference_Location
    Saint Malo
  • Print_ISBN
    978-1-4244-8110-1
  • Electronic_ISBN
    978-1-4244-8111-8
  • Type

    conf

  • DOI
    10.1109/MMSP.2010.5662066
  • Filename
    5662066