• DocumentCode
    3703406
  • Title

    Duration refinement for hybrid speech synthesis system using random forest

  • Author

    Ran Zhang;Xiaoyan Lou;Qinghua Wu

  • Author_Institution
    Samsung Telecom R&D Center, Beijing, China
  • fYear
    2015
  • Firstpage
    792
  • Lastpage
    796
  • Abstract
    The hybrid speech synthesis system which combines the hidden Markov model and unit selection method has been widely used and researched in both industry and academia recently due to its naturalness and expressiveness. However, the target duration, which is used to control the duration of selected candidate, is still predicted via the state-based duration model, whose performance is far from satisfactory. As a result, the synthetic speech sounds somewhat bland and even tedious. In this paper, we replace the state-based duration model with Random Forest (RF). Experiments on English database show that the new model yields more accurate predictions, compared with the baseline state-based duration model. The average improvement of phone RMSEs are 4.265 ms and 14.6% in English speech synthesis. The perceptual experiments on the same database further confirm that proposed model have a better performance than the baseline model.
  • Keywords
    "Hidden Markov models","Predictive models","Speech","Training","Speech synthesis","Databases","Acoustics"
  • Publisher
    ieee
  • Conference_Titel
    Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on
  • Electronic_ISBN
    2156-8111
  • Type

    conf

  • DOI
    10.1109/ACII.2015.7344663
  • Filename
    7344663