• DocumentCode
    1360552
  • Title

    A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units

  • Author

    Tiomkin, S. ; Malah, David ; Shechtman, Slava ; Kons, Zvi

  • Author_Institution
    Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel
  • Volume
    19
  • Issue
    5
  • fYear
    2011
  • fDate
    7/1/2011 12:00:00 AM
  • Firstpage
    1278
  • Lastpage
    1288
  • Abstract
    Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in the stored data, and audible discontinuities may result. On the other hand, statistical TTS (STTS) systems, in spite of having a smaller footprint than CTTS, synthesize speech that is free of such discontinuities. Yet, in general, STTS produces lower quality speech than CTTS, in terms of naturalness, as it is often sounding muffled. The muffling effect is due to over-smoothing of model-generated speech features. In order to gain from the advantages of each of the two approaches, we propose in this work to combine CTTS and STTS into a hybrid TTS (HTTS) system. Each utterance representation in HTTS is constructed from natural segments and model generated segments in an interweaved fashion via a hybrid dynamic path algorithm. Reported listening tests demonstrate the validity of the proposed approach.
  • Keywords
    natural language processing; speech synthesis; statistics; concatenative synthesis unit; hybrid dynamic path algorithm; hybrid text-to-speech system; interweaved fashion; model generated segment; natural segment; natural speech features segment; recorded speech database; statistical synthesis unit; Databases; Heuristic algorithms; Hidden Markov models; Hybrid power systems; Natural languages; Speech; Speech processing; Concatenative text-to-speech (CTTS); TTS synthesis; dynamic path; hybrid TTS; statistical TTS;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2010.2089679
  • Filename
    5609194