• DocumentCode
    2791861
  • Title

    Pronunciation variation generation for spontaneous speech synthesis using state-based voice transformation

  • Author

    Lee, Chung-han ; Wu, Chung-Hsien ; Guo, Jun-Cheng

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    4826
  • Lastpage
    4829
  • Abstract
    This study presents an approach to Hidden Markov Models (HMM)-based spontaneous speech synthesis with pronunciation variation for better spontaneity. Pronunciation variation generally occurs in spontaneous speech and plays an important role in expressing the spontaneity. In this study, a state-based transformation function is adopted to model the relation between read speech and the corresponding spontaneous speech with pronunciation variations. The transformation function is then used to generate the state-based pronunciation variations. Due to the lack of training data, the articulatory features are used to cluster the transformation functions using Classification and Regression Trees (CARTs) such that the unseen pronunciation variation with the same articulatory features can be generated from the transformation function in the same cluster. Objective and subjective tests are conducted to evaluate the performance of the proposed approach. The experimental results show that the proposed transformation function achieves a significant improvement on spontaneity in synthesized speech.
  • Keywords
    hidden Markov models; regression analysis; speech synthesis; trees (mathematics); CART; classification and regression trees; hidden Markov models; pronunciation variation; pronunciation variation generation; spontaneous speech synthesis; state-based pronunciation variations; state-based voice transformation; Classification tree analysis; Computer science; Hidden Markov models; Regression tree analysis; Spatial databases; Speech synthesis; Stochastic processes; Stochastic systems; Testing; Training data; Pronunciation variation; speech synthesis; transformation function;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495140
  • Filename
    5495140