• DocumentCode
    1843650
  • Title

    Use of generation process model for synthesizing fundamental frequency contours in HMM-based speech synthesis

  • Author

    Hirose, Keikichi ; Hashimoto, Hiroya ; Ikeshima, J. ; Minematsu, Nobuaki

  • Author_Institution
    Dept. of Inf. & Commun. Eng., Univ. of Tokyo, Tokyo, Japan
  • Volume
    1
  • fYear
    2012
  • fDate
    21-25 Oct. 2012
  • Firstpage
    575
  • Lastpage
    578
  • Abstract
    Generation process model of fundamental frequency contours is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information conveyed by the utterance. Therefore, by handling fundamental frequency contours in the framework of the generation process model, prosody control with increased flexibility comes possible in speech synthesis. Also, the model can be used to solve problems of HMM-based speech synthesis, which arise from frame-by-frame treatment of fundamental frequencies. Two ways are possible; before training and after generation processes. The former is to suppress unnatural fundamental frequency movements of speech for HMM training, and the latter is to reshape the fundamental frequency contours, generated by HMM-based speech synthesis. A method of prosody conversion is also developed, which views the model command differences between original and target styles. The method enables flexible control of fundamental frequency contours in speech synthesis.
  • Keywords
    computational linguistics; hidden Markov models; interference suppression; speech synthesis; HMM-based speech synthesis; command response model; frame-by-frame treatment; fundamental frequency contour synthesis; generation process model; global feature representation; para-non linguistic information; prosody control; prosody conversion; unnatural fundamental frequency movements suppression; HMM-based speech synthesis; flexible control of prosody; fundamental frequency contour; generation process model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing (ICSP), 2012 IEEE 11th International Conference on
  • Conference_Location
    Beijing
  • ISSN
    2164-5221
  • Print_ISBN
    978-1-4673-2196-9
  • Type

    conf

  • DOI
    10.1109/ICoSP.2012.6491554
  • Filename
    6491554