• DocumentCode
    519272
  • Title

    A hybrid diphone speech unit and a speech corpus construction technique for a Thai text-to-speech system on mobile devices

  • Author

    Wongpatikaseree, K. ; Ratikan, A. ; Chotimongkol, A. ; Chootrakool, P. ; Nattee, C. ; Theeramunkong, T. ; Kobayashi, T.

  • Author_Institution
    Sirindhorn Int. Inst. of Technol., Thammasat Univ., Pathumthani, Thailand
  • fYear
    2010
  • fDate
    19-21 May 2010
  • Firstpage
    1089
  • Lastpage
    1093
  • Abstract
    Most Thai text-to-speech systems on personal computers can synthesize sound in real time with acceptable quality. However, when porting the Thai TTS systems to limited-resource systems such as mobile devices, computational time has to be reduced. Hence, the quality of synthesized sound is decreased. Even though Flite_Thai, a unit concatenation synthesizer for Thai, can reduce the computational time into a real time system, the output sound is quite unintelligible. In this paper, we aim at selecting the appropriate speech unit for Flite_Thai in order to improve its intelligibility. We design a new speech corpus that consists of three different speech units: demi-syllable, diphone and a new speech unit called hybrid diphone. We use a non-sense carrier sentence technique for recording this corpus since we focus more on clear articulation of each speech unit. Our carrier sentence contains a speech unit or a set of similar speech units per sentence without concerning the meaning. We compare the quality of speech synthesized using four types of speech units, a diphone from the TsynC corpus recorded with natural sentences, and the three types of units from the new corpus recorded with non-sense carrier sentences. In terms of intelligibility, all of the speech units from the new corpus achieved higher MOS (Mean Opinion Score) than the existing Flite_Thai system which uses speech units from TsynC. Among the three unit types in the news corpus, demi-syllable obtained the highest score. Although hybrid diphone obtained higher MOS than the existing system and the diphone, it still suffers from a similar problem which is unsmooth joints between units.
  • Keywords
    mobile radio; natural languages; speech processing; speech synthesis; Flite_Thai; Thai TTS system; Thai text-to-speech system; demi-syllable; hybrid diphone speech unit; mobile devices; nonsense carrier sentence technique; speech corpus construction technique; Electronic mail; High temperature superconductors; Information processing; Microcomputers; Mobile computing; Mobile handsets; Real time systems; Speech processing; Speech synthesis; Synthesizers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on
  • Conference_Location
    Chaing Mai
  • Print_ISBN
    978-1-4244-5606-2
  • Electronic_ISBN
    978-1-4244-5607-9
  • Type

    conf

  • Filename
    5491644