DocumentCode :
2052563
Title :
An improved method for predicting fundamental frequency contour in mandarin text-to-speech system with a small corpus
Author :
Wang, Liang ; Zhu, Jie ; LV, Yao
Author_Institution :
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2010
fDate :
21-24 Nov. 2010
Firstpage :
751
Lastpage :
754
Abstract :
In this paper, a method to predict fundamental frequency contour is proposed for mandarin text-to-speech system with a small corpus. Above all, in order to avoid large modification to the speech clips, two kinds of corpus, tonal syllable corpus and high-frequency word corpus, are established. Afterwards, we apply two rules to predict the pitch contour of speech. Firstly, traditional Fujisaki model is modified to be fit in with our small corpus. Secondly, pitch jitter is simulated in a mode based on GMM. According to the fundamental frequency contour predicted by modified Fujisaki model and jitter model, the pitch of speech clips are adjusted by PSOLA algorithm, which can improve the prosody of synthesized speech to make it sound more natural. The method is effective for mandarin text-to-speech system based on a small corpus which is demonstrated by our experiments.
Keywords :
speech synthesis; GMM; Mandarin text-to-speech system; PSOLA algorithm; fundamental frequency contour prediction; high-frequency word corpus; modified Fujisaki model; pitch contour; pitch jitter; speech clips; speech synthesis; tonal syllable corpus; Fujisaki model; Text-to-speech; fundamental frequency contour; jitter; mandarin; small corpus;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
TENCON 2010 - 2010 IEEE Region 10 Conference
Conference_Location :
Fukuoka
ISSN :
pending
Print_ISBN :
978-1-4244-6889-8
Type :
conf
DOI :
10.1109/TENCON.2010.5686602
Filename :
5686602
Link To Document :
بازگشت