Title :
Modeling prosody patterns for Chinese expressive text-to-speech synthesis
Author :
Wu, Zhiyong ; Cai, Lianhong ; Meng, Helen M.
Author_Institution :
Tsinghua-CUHK Joint Res. Center for Media Sci., Tsinghua Univ., Shenzhen, China
fDate :
Nov. 29 2010-Dec. 3 2010
Abstract :
This paper proposes an approach for modeling the prosody patterns of the acoustic features for Chinese expressive text-to-speech (TTS) synthesis. Based on the observation that the speaker usually tends to put more emphasis on one particular syllable within a multi-syllabic prosodic word, we identify such syllable as the core syllable that can be derived from the semantic stress and tone information of the text prompt. We then classify the syllables in speech into four classes, based on their relations with the core syllable in a prosodic word. We analyze the contrastive (neutral versus expressive) speech recordings for each of four classes, and develop a perturbation model that takes into account the prosody pattern to transform neutral speech to expressive speech. Perceptual experiments on both neutral speech recordings and neutral TTS outputs involving 13 subjects indicate that the proposed approach can significantly enhance expressivity in synthesizing expressive speech.
Keywords :
natural language processing; speaker recognition; speech synthesis; text analysis; Chinese expressive text-to-speech synthesis; acoustic features; contrastive speech recordings; multisyllabic prosodic word; neutral speech; perturbation model; prosody patterns; semantic stress; speaker; text prompt; Acoustics; Hidden Markov models; Semantics; Speech; Speech synthesis; Stress; expressive text-to-speech (TTS); non-linear perturbaton model; prosody pattern;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
DOI :
10.1109/ISCSLP.2010.5684494