• DocumentCode
    2704401
  • Title

    Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition

  • Author

    Marcheret, E. ; Libal, V. ; Potamianos, Gerasimos

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    4
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Abstract
    To generate optimal multi-stream audio-visual speech recognition performance, appropriate dynamic weighting of each modality is desired. In this paper, we propose to estimate such weights based on a combination of acoustic signal space observations and single-modality audio and visual speech model likelihoods. Two modeling approaches are investigated for such weight estimation: one based on a sigmoid fitting function, the other employing Gaussian mixture models. Reported experiments demonstrate that the later approach outperforms sigmoid based modeling, and is dramatically superior to the static weighting scheme.
  • Keywords
    Gaussian processes; audio-visual systems; speech processing; speech recognition; Gaussian mixture models; acoustic signal space observations; dynamic stream weight modeling; optimal multistream audio-visual speech recognition; sigmoid fitting function; single-modality audio; static weighting scheme; Automatic speech recognition; Fuses; Hidden Markov models; Linear discriminant analysis; Robustness; Speech processing; Speech recognition; Streaming media; Table lookup; Testing; Audio-Visual Speech Recognition; Multi-Modal Fusion; Multi-Stream HMM; Speech Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0727-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2007.367227
  • Filename
    4218258