DocumentCode :
396795
Title :
Statistic prosody structure prediction
Author :
Shi, Qin ; Ma, Xijun ; Zhu, Weibin ; Zhang, Wei ; Shen, Liqin
Author_Institution :
Speech Technol. Group, IBM China Res. Lab., Beijing, China
fYear :
2002
fDate :
11-13 Sept. 2002
Firstpage :
155
Lastpage :
158
Abstract :
Hierarchical prosody structure generation is a key component for a speech synthesis system. This paper presents a statistic method that predicts the prosody structure for the Chinese text-to-speech (TTS) system by combining a dynamic program method with the rules. The method is based on a manually annotated corpus extracted from the natural speech (IBM Mandarin TTS Corpus for Female 02). The experimental results show that an accuracy of 91.2% for predicting prosodic structure can be achieved. A state-of-the-art Mandarin TTS system is worked out based on the hierarchical prosody structure. Listening tests show that the prosody structure works pretty well.
Keywords :
dynamic programming; speech processing; speech synthesis; statistical analysis; Chinese language; IBM Mandarin TTS Corpus for Female 02; TTS system; dynamic program method; hierarchical prosody structure generation; manually annotated corpus; rules; speech synthesis; statistic prosody structure prediction; text-to-speech system; Buildings; Data mining; Electronic mail; Natural languages; Rhythm; Speech synthesis; Statistics; Tagging; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
Type :
conf
DOI :
10.1109/WSS.2002.1224397
Filename :
1224397
Link To Document :
بازگشت