Multimodal speech synthesis

Author

Schroeter, J. ; Ostermann, J. ; Graf, H.P. ; Beutnagel, M. ; Cosatto, E. ; Syrdal, A. ; Conkie, A. ; Stylianon, Y.

Author_Institution

AT&T Labs. Res., Florham Park, NJ, USA

Volume

1

fYear

2000

fDate

2000

Firstpage

571

Abstract

Multimodal speech synthesis (“talking heads”) encompasses synthesis of speech from text (“text-to-speech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“visual TTS”, VTTS). Talking heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthroughs relevant to the two modalities. In addition, it exposes synergies between the audio and visual technology components. Finally, the paper summarizes test results that highlight the impact of multimodal speech synthesis in communications and e-commerce applications

Keywords

speech synthesis; audio technology components; communications; e-commerce; multimodal speech synthesis; text to speech synthesis; visual face presentation synthesis; visual technology components; Application software; Books; Business; Hardware; Magnetic heads; Speech synthesis; Synthesizers; Testing; Text analysis; Visual databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on

Conference_Location

New York, NY

Print_ISBN

0-7803-6536-4

Type

conf

DOI

10.1109/ICME.2000.869666

Filename

869666