• DocumentCode
    3744830
  • Title

    Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features

  • Author

    Chuang Ding;Lei Xie;Jie Yan;Weini Zhang;Yang Liu

  • Author_Institution
    School of Computer Science, Northwestern Polytechnical University, Xi´an, China
  • fYear
    2015
  • Firstpage
    98
  • Lastpage
    102
  • Abstract
    Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.
  • Keywords
    "Neural networks","Speech","Training","Logic gates","Tagging","Speech synthesis","Hidden Markov models"
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
  • Type

    conf

  • DOI
    10.1109/ASRU.2015.7404780
  • Filename
    7404780