Corpus building for data-driven TTS systems

Author

Zhu, Weibin ; Zhang, Wei ; Shi, Qin ; Chen, Fangxin ; Li, Haiping ; Ma, Xijun ; Shen, Liqin

Author_Institution

IBM China Res. Lab, Beijing, China

fYear

2002

fDate

11-13 Sept. 2002

Firstpage

199

Lastpage

202

Abstract

To generate a data-driven TTS system of Mandarin, we built a large and balanced Mandarin text-and-speech corpus, named IBM Mandarin TTS Corpus. The corpus is designed for both statistical prosody modeling, and context dependence of phonemic features. In the script-design stage, we investigated the problem of a proper synthetic unit. Based on the appropriate choice of synthetic unit, we developed a numerical criterion for the coverage and balance of variants of the synthetic units. In the speech-recording stage, we paid attention to speaking style, which is essential to generate an effective concatenative speech synthesis system. We formulated a specification of speaking style, and guided the speaker to strictly follow the guidelines. Corpus processing is another important step. In that step, we carefully executed pronunciation marking, segment aligning, and the prosodic events labeling, etc. We defined a set of prosodic hierarchical layers, to describe various prosodic events. Because those issues often involve manual effort, the quality of the processed corpus depends on both proper specifications for each step, and the training of the operating team.

Keywords

speech processing; speech synthesis; statistical analysis; IBM Mandarin TTS Corpus; concatenative speech synthesis; context dependence; corpus building; corpus processing; coverage; data-driven TTS systems; numerical criterion; phonemic features; pronunciation marking; prosodic events labeling; prosodic hierarchical layers; script design; segment aligning; speaking style; speech recording; statistical prosody modeling; synthetic unit; variant balance; Concatenated codes; Context modeling; Degradation; Humans; Predictive models; Signal processing; Signal synthesis; Spatial databases; Speech processing; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on

Print_ISBN

0-7803-7395-2

Type

conf

DOI

10.1109/WSS.2002.1224408

Filename

1224408