• DocumentCode
    179570
  • Title

    A compact formulation of turbo audio-visual speech recognition

  • Author

    Receveur, Simon ; Meyer, P. ; Fingscheidt, Tim

  • Author_Institution
    Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    5517
  • Lastpage
    5521
  • Abstract
    Since most automatic speech recognition (ASR) systems still suffer from adverse acoustic conditions and insufficient acoustic modeling, recognition robustness can be improved by integrating further information sources such as additional acoustic channels, modalities, or models. Considering the question of information fusion, interesting parallels to problems in digital communications can be observed, where the turbo principle revolutionized reliable communication. In this paper, we provide new perspectives on turbo ASR: First, we introduce a compact formulation of turbo automatic speech recognition; second, we present a shape-based visual feature extraction algorithm without any learning paradigms. Third, we show an application to an audio-visual speech recognition task on a large data set, where our proposed method clearly outperforms the iterative approach introduced by Shivappa et al. as well as a conventional coupled-hidden-Markov-model approach by up to 23.8% relative reduction in word error rate.
  • Keywords
    audio coding; audio-visual systems; digital communication; error statistics; feature extraction; hidden Markov models; iterative methods; speech recognition; turbo codes; acoustic channels; adverse acoustic conditions; automatic speech recognition; conventional coupled hidden Markov model approach; digital communication; information fusion; insufficient acoustic modeling; iterative approach; recognition robustness; shape-based visual feature extraction algorithm; turbo ASR; turbo audio-visual speech recognition; turbo principle; word error rate; Acoustics; Feature extraction; Hidden Markov models; Iterative decoding; Signal to noise ratio; Speech; Speech recognition; Multimedia systems; hidden Markov models; iterative decoding; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854658
  • Filename
    6854658