Title :
Multimodal speech synthesis
Author :
Schroeter, J. ; Ostermann, J. ; Graf, H.P. ; Beutnagel, M. ; Cosatto, E. ; Syrdal, A. ; Conkie, A. ; Stylianon, Y.
Author_Institution :
AT&T Labs. Res., Florham Park, NJ, USA
Abstract :
Multimodal speech synthesis (“talking heads”) encompasses synthesis of speech from text (“text-to-speech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“visual TTS”, VTTS). Talking heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two modalities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of multimodal speech synthesis in communications and e-commerce applications
Keywords :
speech synthesis; audio technology components; communications; e-commerce; multimodal speech synthesis; text to speech synthesis; visual face presentation synthesis; visual technology components; Application software; Books; Business; Hardware; Magnetic heads; Speech synthesis; Synthesizers; Testing; Text analysis; Visual databases;
Conference_Titel :
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
Conference_Location :
New York, NY
Print_ISBN :
0-7803-6536-4
DOI :
10.1109/ICME.2000.869666