• DocumentCode
    2948640
  • Title

    Production domain modeling of pronunciation for visual speech recognition

  • Author

    Saenko, Kate ; Livescu, Karen ; Glass, James ; Darrell, Trevor

  • Author_Institution
    Comput. Sci. & Artificial Intelligence Lab., MIT, Cambridge, MA, USA
  • Volume
    5
  • fYear
    2005
  • fDate
    18-23 March 2005
  • Abstract
    Articulatory feature models have been proposed in the automatic speech recognition community as an alternative to phone-based models of speech. In this paper, we extend this approach to the visual modality. Specifically, we adapt a recently proposed feature-based model of pronunciation variation to visual speech recognition (VSR) using a set of visually-salient features. The model uses a dynamic Bayesian network (DBN) to represent the evolution of the feature streams. A bank of SVM feature classifiers, with outputs converted to likelihoods, provides input to the DBN. We present preliminary experiments on an isolated-word VSR task, comparing feature-based and viseme-based units and studying the effects of modeling inter-feature asynchrony.
  • Keywords
    belief networks; feature extraction; image classification; radial basis function networks; speech recognition; support vector machines; VSR; articulatory feature models; dynamic Bayesian network; feature-based pronunciation modeling; inter-feature asynchrony modeling; isolated-word VSR task; lipreading; radial basis function SVM classifier; visual speech recognition; Artificial intelligence; Computer science; Glass; Laboratories; Lips; Mouth; Speech recognition; Support vector machine classification; Support vector machines; Tongue;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8874-7
  • Type

    conf

  • DOI
    10.1109/ICASSP.2005.1416343
  • Filename
    1416343