DocumentCode
356716
Title
Multimodal speech synthesis
Author
Schroeter, J. ; Ostermann, J. ; Graf, H.P. ; Beutnagel, M. ; Cosatto, E. ; Syrdal, A. ; Conkie, A. ; Stylianon, Y.
Author_Institution
AT&T Labs. Res., Florham Park, NJ, USA
Volume
1
fYear
2000
fDate
2000
Firstpage
571
Abstract
Multimodal speech synthesis (“talking heads”) encompasses synthesis of speech from text (“text-to-speech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“visual TTS”, VTTS). Talking heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two modalities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of multimodal speech synthesis in communications and e-commerce applications
Keywords
speech synthesis; audio technology components; communications; e-commerce; multimodal speech synthesis; text to speech synthesis; visual face presentation synthesis; visual technology components; Application software; Books; Business; Hardware; Magnetic heads; Speech synthesis; Synthesizers; Testing; Text analysis; Visual databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
Conference_Location
New York, NY
Print_ISBN
0-7803-6536-4
Type
conf
DOI
10.1109/ICME.2000.869666
Filename
869666
Link To Document