DocumentCode
2852894
Title
Towards a multilingual prosody model for text-to-speech
Author
Jokisch, Oliver ; Ding, Hongwei ; Kruschke, Hans
Author_Institution
Dresden University of Technology, Laboratory of Acoustics and Speech Communication, 01062, Germany
Volume
1
fYear
2002
fDate
13-17 May 2002
Abstract
The generation of prosodic parameters such as F0 contour, duration and intensity still remains an important issue for naturally-sounding text-to-speech (TTS), although recently developed TTS systems have achieved a considerable progress. Several appropriate but language-specific rule-based, statistical or data-driven prosody models have been successfully realized in many systems. The language and parameter dependent models lead to a more complex and inefficient TTS system design. In earlier works the authors proposed a hybrid data-driven and rule-based model, which can adjust different voices or speaking styles by learning and predicting prosodic parameters. The current paper discusses the multilingual model generalization and the design of appropriate prosodic databases. Exemplary, two different languages: German and Mandarin Chinese are examined. Prediction results and perceptual evaluation with respect to F0 contours and duration values are presented. Since the perceptual results of both languages are comparable and quite satisfying, the model is qualified for the multilingual prosody control. Resynthesis stimuli obtained from modified prosodic parameters partly achieve near-to-natural mean opinion scores (MOS) above 4.0. The introduced hybrid data-driven and rule-based model is comparatively simple and enables a multilingual prosody control in TTS.
Keywords
Shape; Speech; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location
Orlando, FL, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.2002.5743744
Filename
5743744
Link To Document