DocumentCode
1958400
Title
Spectral mismatch as the index of quality of naturalness in synthetic speech
Author
Kawachale, S.P. ; Gengaje, S.R. ; Chitode, J.S.
Author_Institution
Dept. of E&TC, M.I.T., Pune, India
fYear
2009
fDate
23-26 Aug. 2009
Firstpage
808
Lastpage
813
Abstract
It is extremely tough to make a machine which sounds identical to human. Hence the best text to speech (TTS) algorithm ever made sounds robotic, until and unless human speech itself is involved in it. But it is not possible to create a database of each and every word possible in any language. Syllable based concatenative speech synthesis (CSS) leads to formation of new words from existing words in data base. Improper concatenation with respect to position of the syllable leads to spectral mismatch. A first step to overcome this is to estimate spectral mismatch with respect to position of the syllable. We propose a method based on power spectral density (PSD) to estimate position dependent spectral mismatch. This can be done by plotting power spectral density of 10 millisecond samples of original, properly concatenated (PC) and improperly concatenated (IC) words. These samples are then made noise free to neglect their low amplitude peaks. Analysis of PSD leads to locate formants in the given samples. Formants for original, properly and improperly concatenated words is then plotted. It is observed that formant plots for original and properly concatenated words are very similar for all formants while for improper concatenation extra peaks are observed in all formants. These extra peaks can be considered as estimation for spectral mismatch. The results are validated using Marathi text to speech synthesis.
Keywords
speech synthesis; concatenative speech synthesis; power spectral density; spectral mismatch; synthetic speech; text-to-speech algorithm; Acoustic noise; Cascading style sheets; Concatenated codes; Databases; Frequency; Humans; Magnetic heads; Robots; Speech analysis; Speech synthesis;
fLanguage
English
Publisher
ieee
Conference_Titel
Communications, Computers and Signal Processing, 2009. PacRim 2009. IEEE Pacific Rim Conference on
Conference_Location
Victoria, BC
Print_ISBN
978-1-4244-4560-8
Electronic_ISBN
978-1-4244-4561-5
Type
conf
DOI
10.1109/PACRIM.2009.5291267
Filename
5291267
Link To Document