Title :
Multistream Articulatory Feature-Based Models for Visual Speech Recognition
Author :
Saenko, Kate ; Livescu, Karen ; Glass, James ; Darrell, Trevor
Author_Institution :
Comput. Sci. & Artificial Intell. Lab., MIT, Cambridge, MA, USA
Abstract :
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models.
Keywords :
Bayes methods; speech recognition; visual perception; automatic visual speech recognition; dictionary-based versus whole-word model; discriminative articulatory feature classifier; dynamic Bayesian network model; medium-vocabulary word-ranking task; multistream articulatory feature-based model; vocabulary phrase recognition task; Acoustic noise; Bayesian methods; Glass; Hidden Markov models; Humans; Loudspeakers; Mouth; Speech recognition; Support vector machine classification; Support vector machines; Visual speech recognition; articulatory features; dynamic Bayesian networks; support vector machines.; Algorithms; Computer Simulation; Humans; Image Enhancement; Image Interpretation, Computer-Assisted; Lip; Lipreading; Models, Anatomic; Models, Biological; Pattern Recognition, Automated; Speech Production Measurement; Speech Recognition Software;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2008.303