Title :
Modeling Irregular Voice in Statistical Parametric Speech Synthesis With Residual Codebook Based Excitation
Author :
Csapo, Tamas Gabor ; Nemeth, G.
Author_Institution :
Dept. of Telecommun. & Media Inf., Budapest Univ. of Technol. & Econ., Budapest, Hungary
Abstract :
Statistical parametric text-to-speech synthesis is optimized for regular voices and may not create high-quality output with speakers producing irregular phonation frequently. A number of excitation models have been proposed recently in the hidden Markov-model speech synthesis framework, but few of them deal with the occurrence of this phenomenon. The baseline system of this study is our previous residual codebook based excitation model, which uses frames of pitch-synchronous residuals. To model the irregular voice typically occurring in phrase boundaries or sentence endings, two alternative extensions are proposed. The first, rule-based method applies pitch halving, amplitude scaling of residual periods with random factors and spectral distortion. The second, data-driven approach uses a corpus of residuals extracted from irregularly phonated vowels and unit selection is applied during synthesis. In perception tests of short speech segments, both methods have been found to improve the baseline excitation in preference and similarity to the original speaker. An acoustic experiment has shown that both methods can synthesize irregular voice that is close to original irregular phonation in terms of open quotient. The proposed methods may contribute to building natural, expressive and personalized speech synthesis systems.
Keywords :
hidden Markov models; speech synthesis; HMM; amplitude scaling; baseline excitation; data-driven approach; expressive speech synthesis systems; hidden Markov-model; irregular voice modeling; irregularly phonated vowels; natural speech synthesis systems; open quotient; personalized speech synthesis systems; phrase boundaries; pitch halving; pitch-synchronous residuals; random factors; residual codebook based excitation model; residual periods; rule-based method; sentence endings; short speech segments; spectral distortion; statistical parametric text-to-speech synthesis; unit selection; Biological system modeling; Databases; Hidden Markov models; High-temperature superconductors; Speech; Speech synthesis; Training; Creaky voice; HMM; excitation; glottalization; irregular phonation; parametric; residual; speech processing; speech synthesis; vocal fry; voice quality;
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
DOI :
10.1109/JSTSP.2013.2292037