• DocumentCode
    3164195
  • Title

    A small footprint hybrid statistical/unit selection text-to-speech synthesis system for agglutinative languages

  • Author

    Guner, Ekrem ; Demiroglu, Cenk

  • Author_Institution
    Ozyegin Univ., Istanbul, Turkey
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4537
  • Lastpage
    4540
  • Abstract
    Despite its success, unit selection based text-to-speech synthesis (TTS) has has some disadvantages such as sudden discontinuities in speech that distract the listeners. The HMM-based TTS (HTS) approach has been increasingly getting more attention from the TTS research community. One of the advantage is the lack of spurious errors that are observed in the unit selection scheme. Another advantage of the HTS system is the small memory footprint requirement which makes it attractive for embedded devices. Here, we propose a novel hybrid statistical unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. The intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Listeners preferred the hybrid system over the baseline system in the A/B preference tests.
  • Keywords
    hidden Markov models; natural language processing; speech intelligibility; speech synthesis; statistical analysis; English; HMM-based TTS approach; HTS approach; MOS scores; TTS research community; TTS system; agglutinative languages; baseline HTS system quality; blizzard challenge tests; embedded devices; footprint hybrid statistical selection text-to-speech synthesis system; footprint hybrid unit selection text-to-speech synthesis system; hybrid statistical unit selection; intelligibility; memory footprint requirement; quality scores; spurious errors; Databases; Hidden Markov models; High temperature superconductors; Speech; Speech synthesis; Training; Trajectory; HMM-based TTS; Turkish TTS; agglutinative languages; small memory footprint; speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6288927
  • Filename
    6288927