• DocumentCode
    2872011
  • Title

    Triphone based unit selection for concatenative visual speech synthesis

  • Author

    Huang, Fu Jie ; Cosatto, Eric ; Graf, Hans Peter

  • Author_Institution
    AT&T Labs-Research, USA
  • Volume
    2
  • fYear
    2002
  • fDate
    13-17 May 2002
  • Abstract
    Concatenative visual speech synthesis selects frames from a large recorded video database of mouth shapes to generate photo-realistic talking head sequences. The synthesized sequence must exhibit precise lip-sound synchronization and smooth articulation. The selection process for finding the best lip shapes has been computationally expensive [1], limiting the speed of the synthesis to far less than real time. In this paper, we propose a rapid unit selection approach based on triphone units. Experiments show that this algorithm can make the synthesis, excluding the rendering, 50 times faster than real-time on a standard desktop PC. We also developed a metric to test the quality of the synthesis objectively, and show that this measurement is consistent with subjective measurement results.
  • Keywords
    Computers; Manuals; Nonvolatile memory; Speech; Speech processing; Switches; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
  • Conference_Location
    Orlando, FL, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2002.5745033
  • Filename
    5745033