DocumentCode :
356691
Title :
Prosody model in a Mandarin text-to-speech system based on a hierarchical approach
Author :
Pan, Neng-Huang ; Jen, Wen-Tsai ; Yu, Shyr-Shen ; Yu, Ming-shing ; Huang, Shyh-Yang ; Wu, Ming-Jer
Author_Institution :
Dept. of Appl. Math., Nat. Chung-Hsing Univ., Taichung, Taiwan
Volume :
1
fYear :
2000
fDate :
2000
Firstpage :
448
Abstract :
The authors developed a prosody model in a Mandarin text-to-speech (TTS) system. We extract some meaningful parameters form the voice files and text files. We find these parameters in a hierarchical way. For each syllable, we consider the following four parameters (there are five parameters in our duration prediction model): information of word (consonants, vowel and tone); information of phrase; information of breath group; and information of sentences (duration model add punctuation mark). In the syllable duration prediction model, there are 37% training syllables in the inside test and 43% test syllables in the outside test, with prediction error less than ratio 0.1. The average error of all syllables in the inside test is 0.182 and 0.169 in the outside test. In the syllable volume prediction model, there are 81% training syllables in the inside test and 76.2% test syllables in the outside test, with prediction error less than ratio 0.1. The average error of all syllables in the inside test is 0.176 and 0.166 in the outside test. For the performance evaluation of the pitch prediction module, there are 64% internal samples and 57% external samples with pattern error being within 5 Hz. The average pattern error of all syllables in the inside test is 5 Hz and 6 Hz in the outside test
Keywords :
natural languages; performance evaluation; speech synthesis; text analysis; Mandarin text-to-speech system; average error; average pattern error; breath group; duration prediction model; external samples; hierarchical approach; internal samples; outside test; pattern error; performance evaluation; phrase; pitch prediction module; prediction error; prosody model; punctuation mark; sentences; syllable duration prediction model; syllable volume prediction model; text files; training syllables; voice files; word information; Data mining; Mathematical model; Mathematics; Predictive models; Speech analysis; Speech synthesis; Statistics; Synthesizers; Testing; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
Conference_Location :
New York, NY
Print_ISBN :
0-7803-6536-4
Type :
conf
DOI :
10.1109/ICME.2000.869636
Filename :
869636
Link To Document :
بازگشت