Title :
Generation of F0 contours using a model-constrained data-driven method
Author :
Sakurai, A. ; Hirose, K. ; Minematsu, N.
Author_Institution :
Texas Instruments Japan Ltd, Tsukuba, Japan
Abstract :
Introduces a model-constrained, data-driven method for generating fundamental frequency contours in Japanese text-to-speech synthesis. In the training phase, the parameters of a command-response F0 contour generation model are learned by a prediction module, which can be a neural network or a set of binary regression trees. The input features consist of linguistic information related to accentual phrases that can be automatically derived from text, such as the position of the accentual phrase in the utterance, number of morae, accent type, and parts-of-speech. In the synthesis phase, the prediction module is used to generate appropriate values of model parameters. The use of the parametric model restricts the degrees of freedom of the problem, facilitating data-driven learning. Experimental results show that the method makes it possible to generate quite natural F0 contours with a relatively small training database
Keywords :
filtering theory; multilayer perceptrons; natural languages; prediction theory; recurrent neural nets; speech synthesis; statistical analysis; F0 contour generation; Japanese text-to-speech synthesis; accentual phrases; binary regression trees; data-driven learning; fundamental frequency contours; linguistic information; model-constrained data-driven method; neural network; parametric model; prediction module; synthesis phase; Equations; Knowledge based systems; Nonlinear filters; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
Conference_Location :
Salt Lake City, UT
Print_ISBN :
0-7803-7041-4
DOI :
10.1109/ICASSP.2001.941040