• DocumentCode
    356716
  • Title

    Multimodal speech synthesis

  • Author

    Schroeter, J. ; Ostermann, J. ; Graf, H.P. ; Beutnagel, M. ; Cosatto, E. ; Syrdal, A. ; Conkie, A. ; Stylianon, Y.

  • Author_Institution
    AT&T Labs. Res., Florham Park, NJ, USA
  • Volume
    1
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    571
  • Abstract
    Multimodal speech synthesis (“talking heads”) encompasses synthesis of speech from text (“text-to-speech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“visual TTS”, VTTS). Talking heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two modalities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of multimodal speech synthesis in communications and e-commerce applications
  • Keywords
    speech synthesis; audio technology components; communications; e-commerce; multimodal speech synthesis; text to speech synthesis; visual face presentation synthesis; visual technology components; Application software; Books; Business; Hardware; Magnetic heads; Speech synthesis; Synthesizers; Testing; Text analysis; Visual databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
  • Conference_Location
    New York, NY
  • Print_ISBN
    0-7803-6536-4
  • Type

    conf

  • DOI
    10.1109/ICME.2000.869666
  • Filename
    869666