A compact formulation of turbo audio-visual speech recognition

Author

Receveur, Simon ; Meyer, P. ; Fingscheidt, Tim

Author_Institution

Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany

fYear

2014

fDate

4-9 May 2014

Firstpage

5517

Lastpage

5521

Abstract

Since most automatic speech recognition (ASR) systems still suffer from adverse acoustic conditions and insufficient acoustic modeling, recognition robustness can be improved by integrating further information sources such as additional acoustic channels, modalities, or models. Considering the question of information fusion, interesting parallels to problems in digital communications can be observed, where the turbo principle revolutionized reliable communication. In this paper, we provide new perspectives on turbo ASR: First, we introduce a compact formulation of turbo automatic speech recognition; second, we present a shape-based visual feature extraction algorithm without any learning paradigms. Third, we show an application to an audio-visual speech recognition task on a large data set, where our proposed method clearly outperforms the iterative approach introduced by Shivappa et al. as well as a conventional coupled-hidden-Markov-model approach by up to 23.8% relative reduction in word error rate.

Keywords

audio coding; audio-visual systems; digital communication; error statistics; feature extraction; hidden Markov models; iterative methods; speech recognition; turbo codes; acoustic channels; adverse acoustic conditions; automatic speech recognition; conventional coupled hidden Markov model approach; digital communication; information fusion; insufficient acoustic modeling; iterative approach; recognition robustness; shape-based visual feature extraction algorithm; turbo ASR; turbo audio-visual speech recognition; turbo principle; word error rate; Acoustics; Feature extraction; Hidden Markov models; Iterative decoding; Signal to noise ratio; Speech; Speech recognition; Multimedia systems; hidden Markov models; iterative decoding; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854658

Filename

6854658