• DocumentCode
    394734
  • Title

    Using viseme based acoustic models for speech driven lip synthesis

  • Author

    Verma, Ashish ; Rajput, Nitendru ; Subramaniam, L. Venkata

  • Author_Institution
    IBM India Res. Lab., Indian Inst. of Technol., New Delhi, India
  • Volume
    5
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    Speech driven lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate a phonetic sequence which is then converted to the corresponding viseme sequence to be animated. We present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, instead of the usual phone based acoustic models, to align the input speech signal. This results in higher accuracy and speed of the alignment procedure and allows a much simpler implementation of the speech driven lip synthesis system as it completely obviates the requirement of an acoustic unit to visual unit conversion. We show, through various experiments, that the proposed method results in about 53% relative improvement in classification accuracy and about 52% reduction in the time required to compute alignments.
  • Keywords
    computer animation; human computer interaction; image processing; speech recognition; speech-based user interfaces; acoustic models; human-computer interaction; phonetic sequence; speech driven lip synthesis; speech recognition; viseme sequence; Animation; Hidden Markov models; Humans; Image databases; Image segmentation; Neural networks; Signal generators; Signal synthesis; Speech recognition; Speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1200072
  • Filename
    1200072