A Chinese text to speech system based on TD-PSOLA

Author

Zhu, Yunbo ; Zhao, Li ; Xu, Yunbiao ; Niimi, Yasuhisa

Author_Institution

Dept. of radio engineering, Southeast Univ., Nanjing, China

Volume

1

fYear

2002

fDate

28-31 Oct. 2002

Firstpage

204

Abstract

The paper presents the implementation of a Chinese text to speech (hereafter called TTS) system based on the Time Domain Pitch-Synchronous OverLap-Add approach (hereafter called TD-PSOLA). In order to get natural synthesized speech, it is necessary to precisely extract pitch-marks for each monosyllabic speech unit, to predict the length of syllables in a sentence to be synthesized and to generate F0-contours for their final portion. In the paper, we concentrate on the last two issues to propose a scheme to predict syllable duration. which gives an accuracy of about 18% of the relative length error, and to generate F0-contour. To synthesize a certain tonal syllable with a desired duration, a new pattern-scaling algorithm was proposed. The preliminary hearing test showed the intelligibility and naturalness of synthetic speech were good.

Keywords

feature extraction; prediction theory; speech intelligibility; speech synthesis; time-domain synthesis; Chinese text to speech system; F0-contour generation; TD-PSOLA; TTS system; Time Domain Pitch-Synchronous OverLap-Add approach; monosyllabic speech unit; natural synthesized speech; pattern-scaling algorithm; pitch-mark extraction; speech intelligibility; syllable duration; syllable length prediction; synthetic speech naturalness; Auditory system; Computer errors; Costs; Information science; Natural languages; Signal synthesis; Speech analysis; Speech synthesis; System performance; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

TENCON '02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering

Print_ISBN

0-7803-7490-8

Type

conf

DOI

10.1109/TENCON.2002.1181250

Filename

1181250