• DocumentCode
    748078
  • Title

    Robust Audio-Visual Speech Recognition Based on Late Integration

  • Author

    Lee, Jong-Seok ; Park, Cheol Hoon

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., KAIST, Daejeon
  • Volume
    10
  • Issue
    5
  • fYear
    2008
  • Firstpage
    767
  • Lastpage
    779
  • Abstract
    Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech.
  • Keywords
    audio-visual systems; hidden Markov models; neural nets; speech recognition; acoustic subsystem; audio-visual speech recognition; hidden Markov models; neural networks; noisy environments; Audio-visual speech recognition; hidden Markov model; interframe correlation; late integration; neural network; robustness; stochastic optimization;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2008.922789
  • Filename
    4540195