• DocumentCode
    1464201
  • Title

    Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

  • Author

    Hsia, Chi-Chun ; Wu, Chung-Hsien ; Wu, Jung-Yun

  • Author_Institution
    ICT-Enabled Healthcare Program, Ind. Technol. Res. Inst.-South, Tainan, Taiwan
  • Volume
    18
  • Issue
    8
  • fYear
    2010
  • Firstpage
    1994
  • Lastpage
    2003
  • Abstract
    This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.
  • Keywords
    entropy; hidden Markov models; regression analysis; speech synthesis; HMM-based speech synthesis; Mandarin speech synthesis; dynamic features; entropy reduction; hidden Markov model; minimum description length; pitch modeling; prosody hierarchy; regression tree; supervised classification; Classification tree analysis; Computer industry; Computer science; Cost function; Entropy; Hidden Markov models; Medical services; Regression tree analysis; Speech synthesis; Testing; Dynamic features; hidden Markov model (HMM)-based speech synthesis; pitch modeling and generation; prosody hierarchy;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2040791
  • Filename
    5443736