Title :
Corpus building for data-driven TTS systems
Author :
Zhu, Weibin ; Zhang, Wei ; Shi, Qin ; Chen, Fangxin ; Li, Haiping ; Ma, Xijun ; Shen, Liqin
Author_Institution :
IBM China Res. Lab, Beijing, China
Abstract :
To generate a data-driven TTS system of Mandarin, we built a large and balanced Mandarin text-and-speech corpus, named IBM Mandarin TTS Corpus. The corpus is designed for both statistical prosody modeling, and context dependence of phonemic features. In the script-design stage, we investigated the problem of a proper synthetic unit. Based on the appropriate choice of synthetic unit, we developed a numerical criterion for the coverage and balance of variants of the synthetic units. In the speech-recording stage, we paid attention to speaking style, which is essential to generate an effective concatenative speech synthesis system. We formulated a specification of speaking style, and guided the speaker to strictly follow the guidelines. Corpus processing is another important step. In that step, we carefully executed pronunciation marking, segment aligning, and the prosodic events labeling, etc. We defined a set of prosodic hierarchical layers, to describe various prosodic events. Because those issues often involve manual effort, the quality of the processed corpus depends on both proper specifications for each step, and the training of the operating team.
Keywords :
speech processing; speech synthesis; statistical analysis; IBM Mandarin TTS Corpus; concatenative speech synthesis; context dependence; corpus building; corpus processing; coverage; data-driven TTS systems; numerical criterion; phonemic features; pronunciation marking; prosodic events labeling; prosodic hierarchical layers; script design; segment aligning; speaking style; speech recording; statistical prosody modeling; synthetic unit; variant balance; Concatenated codes; Context modeling; Degradation; Humans; Predictive models; Signal processing; Signal synthesis; Spatial databases; Speech processing; Speech synthesis;
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
DOI :
10.1109/WSS.2002.1224408