Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil

Author

Boothalingam, Ramani ; Sherlin Solomi, V ; Gladston, Anushiya Rachel ; Christina, S Lilly ; Vijayalakshmi, P ; Thangavelu, Nagarajan ; Murthy, Hema A

Author_Institution

Speech Lab, SSN College of Engineering, India

fYear

2013

Firstpage

1

Lastpage

5

Abstract

An unrestricted text-to-speech system is expected to produce a speech signal, corresponding to the given text in a language, that is highly intelligible to a human listener. Presently, unit selection-based synthesis (USS) and statistical parametric synthesis techniques are the state-of-art techniques for this task. Earlier, in [3], a concatenative synthesizer was developed for the language, Tamil, using 12 hrs of speech data, and shown that syllable is the better subword unit. The current work focuses on building FestVox voices using phoneme/CV unit as the subword unit, for a reduced amount of speech data (5 hrs) and to compare their performances in terms of quality. Further, the focus is to compare the performance of this synthesizer with that of the well known HMM-based speech synthesizer. Among the phoneme and CV-based systems built, although there are bound to be more concatenation points in a phoneme-based system, it is observed that it triumphs the CV-based system with an MOS of 2.96, primarily because, there are more examples available for each phoneme for the given amount of speech data. Further, an HMM-based speech synthesis system is developed using 5 hrs data. Although, in the synthesized speech, the speaker identity is not completely preserved, there are no sonic-glitches and the quality obtained is much better than that of a phoneme/CV-based systems, with an MOS of 3.86. Further, the footprint size of the system is exorbitantly reduced from 1 GB in USS system to 6 MB in HMM-based speech synthesis system.

Keywords

Buildings; Context modeling; Databases; Feature extraction; Hidden Markov models; Speech; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Communications (NCC), 2013 National Conference on

Conference_Location

New Delhi, India

Print_ISBN

978-1-4673-5950-4

Electronic_ISBN

978-1-4673-5951-1

Type

conf

DOI

10.1109/NCC.2013.6487984

Filename

6487984