• DocumentCode
    705876
  • Title

    Low-dimensional motion features for audio-visual speech recognition

  • Author

    Valles Carboneras, Andres ; Gurban, Mihai ; Thiran, Jean-Philippe

  • Author_Institution
    E.T.S.I. de Telecomun., Univ. Politec. de Madrid, Madrid, Spain
  • fYear
    2007
  • fDate
    3-7 Sept. 2007
  • Firstpage
    297
  • Lastpage
    301
  • Abstract
    Audio-visual speech recognition promises to improve the performance of speech recognizers, especially when the audio is corrupted, by adding information from the visual modality, more specifically, from the video of the speaker. However, the number of visual features that are added is typically bigger than the number of audio features, for a small gain in accuracy. We present a method that shows gains in performance comparable to the commonly-used DCT features, while employing a much smaller number of visual features based on the motion of the speaker´s mouth. Motion vector differences are used to compensate for errors in the mouth tracking. This leads to a good performance even with as few as 3 features. The advantage of low-dimensional features is that a good accuracy can be obtained with relatively little training data, while also increasing the speed of both training and testing.
  • Keywords
    audio signal processing; audio-visual systems; discrete cosine transforms; speaker recognition; DCT features; audio features; audio-visual speech recognition; low-dimensional features; low-dimensional motion features; motion vector; mouth tracking; speaker mouth; visual features; visual modality; Discrete cosine transforms; Feature extraction; Hidden Markov models; Mouth; Optical imaging; Speech recognition; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference, 2007 15th European
  • Conference_Location
    Poznan
  • Print_ISBN
    978-839-2134-04-6
  • Type

    conf

  • Filename
    7098812