Title :
Discriminative training of HMM stream exponents for audio-visual speech recognition
Author :
Potamianos, Gerasimos ; Graf, Hans Peter
Author_Institution :
AT&T Labs., Florham Park, NJ, USA
Abstract :
We propose the use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only single-stream HMMs of identical topology by maximum likelihood. A two-stream HMM is then obtained by combining the two single-stream HMMs and introducing exponents that weigh the log-likelihood of each stream. We present the GPD algorithm for stream exponent estimation, consider a possible initialization, and apply it to the single speaker connected letters task of the AT&T bimodal database. We demonstrate the superior performance of the resulting multi-stream HMM to the audio-only, visual-only, and audio-visual single-stream HMMs
Keywords :
audio-visual systems; feature extraction; hidden Markov models; maximum likelihood estimation; probability; speech recognition; synchronisation; AT&T bimodal database; HMM stream exponents; audio features; audio-only stream; audio-visual speech recognition; discriminative training; generalized probabilistic descent algorithm; hidden Markov model; initialization; log-likelihood; maximum likelihood; single speaker connected letters task; stream exponent estimation; synchronized features; two-stream HMM; visual features; visual-only stream; Automatic speech recognition; Hidden Markov models; Lips; Mutual information; Speech recognition; Streaming media; Testing; Topology; Visual databases; Vocabulary;
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7803-4428-6
DOI :
10.1109/ICASSP.1998.679695