Title :
Control of prosodie focus in corpus-based generation of fundamental frequency contours based on the generation process model
Author :
Hirose, Keikichi ; Ochi, Keiko ; Minematsu, Nobuaki
Author_Institution :
Dept. of Inf. & Commun. Eng., Univ. of Tokyo, Tokyo, Japan
Abstract :
HMM-based speech synthesis is known to be a possible solution for realizing "flexibility" in speech synthesis. However, its frame-by-frame process of acoustic features is not appropriate for prosodic features. Prosodic features cover a wider time span as compared to segmental features, and should be handled differently. From this point of view, a method has been developed for generating sentence F0 contours based on the generation process model, which models sentence F0 contours in logarithmic scale as super-positions of phrase and accent components. These components are further represented as responses of discrete commands, which have tight relations with linguistic and para-/non-linguistic information of sentences. By predicting the model commands instead of frame-by-frame F0 values, a flexible and robust F0 control can be realized. As an example of flexible control, a method is developed for generating sentence F0 contours of Japanese, when a focus is placed in one of the “bunsetsu\´s” of an utterance. The method first predicts differences in the F0 model commands between utterances with and without focus, and then applies them to the F0 model commands predicted beforehand by the baseline method without focus assignment. The baseline method is trained using a large corpus, while corpus for training command differences can be small and not necessarily be uttered by the same speaker of the large corpus. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis, including interpolation/extrapolation of the F0 model commands for focus level control.
Keywords :
acoustic focusing; extrapolation; hidden Markov models; interpolation; speech synthesis; HMM-based speech synthesis; accent components; acoustic features; contour generation; corpus-based generation; discrete commands; extrapolation; focus assignment; focus level control; frame-by-frame process; fundamental frequency contours; generation process model; interpolation; logarithmic scale; model commands; prosodic features; prosodic focus; segmental features; training command differences; Hidden Markov models; Pragmatics; Predictive models; Process control; Speech; Speech synthesis; Training; Corpus-based method; F0 contour; Generation process model; Prosodie focus; Speech synthesis;
Conference_Titel :
Signal Processing (ICSP), 2010 IEEE 10th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5897-4
DOI :
10.1109/ICOSP.2010.5656835