• DocumentCode
    3421185
  • Title

    Control of prosodie focus in corpus-based generation of fundamental frequency contours based on the generation process model

  • Author

    Hirose, Keikichi ; Ochi, Keiko ; Minematsu, Nobuaki

  • Author_Institution
    Dept. of Inf. & Commun. Eng., Univ. of Tokyo, Tokyo, Japan
  • fYear
    2010
  • fDate
    24-28 Oct. 2010
  • Firstpage
    629
  • Lastpage
    632
  • Abstract
    HMM-based speech synthesis is known to be a possible solution for realizing "flexibility" in speech synthesis. However, its frame-by-frame process of acoustic features is not appropriate for prosodic features. Prosodic features cover a wider time span as compared to segmental features, and should be handled differently. From this point of view, a method has been developed for generating sentence F0 contours based on the generation process model, which models sentence F0 contours in logarithmic scale as super-positions of phrase and accent components. These components are further represented as responses of discrete commands, which have tight relations with linguistic and para-/non-linguistic information of sentences. By predicting the model commands instead of frame-by-frame F0 values, a flexible and robust F0 control can be realized. As an example of flexible control, a method is developed for generating sentence F0 contours of Japanese, when a focus is placed in one of the “bunsetsu\´s” of an utterance. The method first predicts differences in the F0 model commands between utterances with and without focus, and then applies them to the F0 model commands predicted beforehand by the baseline method without focus assignment. The baseline method is trained using a large corpus, while corpus for training command differences can be small and not necessarily be uttered by the same speaker of the large corpus. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis, including interpolation/extrapolation of the F0 model commands for focus level control.
  • Keywords
    acoustic focusing; extrapolation; hidden Markov models; interpolation; speech synthesis; HMM-based speech synthesis; accent components; acoustic features; contour generation; corpus-based generation; discrete commands; extrapolation; focus assignment; focus level control; frame-by-frame process; fundamental frequency contours; generation process model; interpolation; logarithmic scale; model commands; prosodic features; prosodic focus; segmental features; training command differences; Hidden Markov models; Pragmatics; Predictive models; Process control; Speech; Speech synthesis; Training; Corpus-based method; F0 contour; Generation process model; Prosodie focus; Speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2010 IEEE 10th International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-5897-4
  • Type

    conf

  • DOI
    10.1109/ICOSP.2010.5656835
  • Filename
    5656835