• DocumentCode
    1712347
  • Title

    Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil

  • Author

    Boothalingam, Ramani ; Sherlin Solomi, V ; Gladston, Anushiya Rachel ; Christina, S Lilly ; Vijayalakshmi, P ; Thangavelu, Nagarajan ; Murthy, Hema A

  • Author_Institution
    Speech Lab, SSN College of Engineering, India
  • fYear
    2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    An unrestricted text-to-speech system is expected to produce a speech signal, corresponding to the given text in a language, that is highly intelligible to a human listener. Presently, unit selection-based synthesis (USS) and statistical parametric synthesis techniques are the state-of-art techniques for this task. Earlier, in [3], a concatenative synthesizer was developed for the language, Tamil, using 12 hrs of speech data, and shown that syllable is the better subword unit. The current work focuses on building FestVox voices using phoneme/CV unit as the subword unit, for a reduced amount of speech data (5 hrs) and to compare their performances in terms of quality. Further, the focus is to compare the performance of this synthesizer with that of the well known HMM-based speech synthesizer. Among the phoneme and CV-based systems built, although there are bound to be more concatenation points in a phoneme-based system, it is observed that it triumphs the CV-based system with an MOS of 2.96, primarily because, there are more examples available for each phoneme for the given amount of speech data. Further, an HMM-based speech synthesis system is developed using 5 hrs data. Although, in the synthesized speech, the speaker identity is not completely preserved, there are no sonic-glitches and the quality obtained is much better than that of a phoneme/CV-based systems, with an MOS of 3.86. Further, the footprint size of the system is exorbitantly reduced from 1 GB in USS system to 6 MB in HMM-based speech synthesis system.
  • Keywords
    Buildings; Context modeling; Databases; Feature extraction; Hidden Markov models; Speech; Speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications (NCC), 2013 National Conference on
  • Conference_Location
    New Delhi, India
  • Print_ISBN
    978-1-4673-5950-4
  • Electronic_ISBN
    978-1-4673-5951-1
  • Type

    conf

  • DOI
    10.1109/NCC.2013.6487984
  • Filename
    6487984