Spectral mismatch as the index of quality of naturalness in synthetic speech

Author

Kawachale, S.P. ; Gengaje, S.R. ; Chitode, J.S.

Author_Institution

Dept. of E&TC, M.I.T., Pune, India

fYear

2009

fDate

23-26 Aug. 2009

Firstpage

808

Lastpage

813

Abstract

It is extremely tough to make a machine which sounds identical to human. Hence the best text to speech (TTS) algorithm ever made sounds robotic, until and unless human speech itself is involved in it. But it is not possible to create a database of each and every word possible in any language. Syllable based concatenative speech synthesis (CSS) leads to formation of new words from existing words in data base. Improper concatenation with respect to position of the syllable leads to spectral mismatch. A first step to overcome this is to estimate spectral mismatch with respect to position of the syllable. We propose a method based on power spectral density (PSD) to estimate position dependent spectral mismatch. This can be done by plotting power spectral density of 10 millisecond samples of original, properly concatenated (PC) and improperly concatenated (IC) words. These samples are then made noise free to neglect their low amplitude peaks. Analysis of PSD leads to locate formants in the given samples. Formants for original, properly and improperly concatenated words is then plotted. It is observed that formant plots for original and properly concatenated words are very similar for all formants while for improper concatenation extra peaks are observed in all formants. These extra peaks can be considered as estimation for spectral mismatch. The results are validated using Marathi text to speech synthesis.

Keywords

speech synthesis; concatenative speech synthesis; power spectral density; spectral mismatch; synthetic speech; text-to-speech algorithm; Acoustic noise; Cascading style sheets; Concatenated codes; Databases; Frequency; Humans; Magnetic heads; Robots; Speech analysis; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Communications, Computers and Signal Processing, 2009. PacRim 2009. IEEE Pacific Rim Conference on

Conference_Location

Victoria, BC

Print_ISBN

978-1-4244-4560-8

Electronic_ISBN

978-1-4244-4561-5

Type

conf

DOI

10.1109/PACRIM.2009.5291267

Filename

5291267