• DocumentCode
    552465
  • Title

    Break prediction of prosody for Hakka´S TTS systems based on data mining approaches

  • Author

    Huang, Fong-long ; Pan, Neng-Huang ; Yu, Ming-shing ; Wu, Jun-yi

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. United Univ., Miaoli, Taiwan
  • Volume
    1
  • fYear
    2011
  • fDate
    10-13 July 2011
  • Firstpage
    51
  • Lastpage
    55
  • Abstract
    This paper aims at the prosody generation for Hakka´s language based on the data mining approaches, and implement the TTS system on Internet. Our system is composed of the following four components: 1) Text analysis, 2) Mandarin to Hakka word translation, 3) Prosody prediction, and 4) Speech generation module. More than 2427 monosyllabic speech units and 2234 word speech units of Hakka and several silences with various durations have been recorded as basic units for speech synthesis. We focus on adding breaks to speeches, with emphasis on predicting the types of break. There are three kinds of breaks: major break, minor break and no-break between words. We train a break model and predict break based on the data mining approaches - Bayesian network (BN) and CART classifier. The best precision rate for testing achieves 80.17% based on the CART. Fourteen students familiar with Hakka joined to evaluate the prosody quality of synthesized speeches. The results with 10 scale achieves 7.54 score in average. Based on the comprehensive evaluation, it is obvious that our system can synthesize the clear and natural Hakka´s speeches.
  • Keywords
    Bayes methods; data mining; natural language processing; pattern classification; speech synthesis; text analysis; Bayesian network; CART classifier; Hakka TTS systems; Hakka language; break prediction; data mining; monosyllabic speech units; prosody; speech generation; speech synthesis; text analysis; text to speech system; word translation; Bayesian methods; Cybernetics; Data mining; Machine learning; Speech; Speech processing; Testing; Data Mining; Hakka Language; Prosody Prediction; Text-to-speech (TTS) system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
  • Conference_Location
    Guilin
  • ISSN
    2160-133X
  • Print_ISBN
    978-1-4577-0305-8
  • Type

    conf

  • DOI
    10.1109/ICMLC.2011.6016704
  • Filename
    6016704