• DocumentCode
    352468
  • Title

    From speech to talking faces: lip movements estimation based on linear approximators

  • Author

    Vignoli, F.

  • Author_Institution
    Genoa Univ.
  • Volume
    6
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    2381
  • Abstract
    In human communication, speech understanding is greatly improved by the bimodal acoustic-visual effect, with respect to simple speech. This is particularly clear when the communication takes place in noisy environments or for non-native speakers. In this paper, we propose a novel algorithm based on linear approximators that estimates the lip movements from a timed sequence of phonemes. This sequence can be generated from real speech, by a segmentation technique based on a hidden Markov model (HMM), or from a text-to-speech system. The algorithm consists of two modules: the training module and the synthesis module. The training module is based on a eigen-analysis of an audiovisual database recorded for this purpose. The synthesis module takes as input the sequence of phonemes and implements an implicit coarticulation model. A later post-processing step converts the parameters estimated into a sequence of facial animation parameters that are compliant to the new MPEG-4 standard. The algorithm has been tested with FAE (Facial Animation Engine), which is an MPEG-4 compliant system developed at the author´s university
  • Keywords
    approximation theory; audio-visual systems; code standards; computer animation; eigenvalues and eigenfunctions; face recognition; hidden Markov models; learning systems; motion estimation; parameter estimation; sequences; speech intelligibility; speech synthesis; subroutines; FAE; Facial Animation Engine; MPEG-4 compliant system; audiovisual database; bimodal acoustic-visual effect; eigen-analysis; facial animation parameter sequence; hidden Markov model; human communication; implicit coarticulation model; linear approximators; lip movement estimation; noisy environments; nonnative speakers; parameter estimation; post-processing; speech segmentation technique; speech synthesis module; speech understanding; talking faces; text-to-speech system; timed phoneme sequence; training module; Audio databases; Facial animation; Hidden Markov models; Humans; Linear approximation; Loudspeakers; MPEG 4 Standard; Parameter estimation; Speech synthesis; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-6293-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2000.859320
  • Filename
    859320