Title :
FlexVoice: a parametric approach to high-quality speech synthesis
Author :
Balogh, Gyorgy ; Dobler, Ervin ; Gróbler, Tamás ; Smodies, B. ; Szepesvári, Csaba
Author_Institution :
Mindmaker Ltd., Budapest, Hungary
Abstract :
The TTS system described in this paper is based on the analysis and resynthesis of a given speaker´s voice. First, the speaker´s voice definition is prepared off-line: a diphone database is recorded, segmented, and analyzed in every 6 msec to obtain the filter parameters of an all-pole (AR) filter. During the on-line synthesis, the filters are excited with the mixture of a predefined periodic glottal source and white noise. Rigorous experiments have been made to find the parameter space in which the filter coefficients at diphone boundaries can effectively be smoothened. The best representation turned out to be the space of area ratios. Due to the smoothening and the carefully chosen corpus words, each diphone needs to be recorded only once thus no unit selection algorithm is needed. FlexVoice provides large flexibility in changing voice properties independently from the vocal tract parameters. This flexibility can be demonstrated by a number of voice conversions including female-to-male and female-to-child conversions. FlexVoice only uses a fraction of the resources of a PC and its quality is comparable to that of the leading TTS systems
Keywords :
speech synthesis; FlexVoice; TTS system; all-pole filter; corpus words; diphone boundaries; diphone database; experiments; filter coefficients; filter parameters; high-quality speech synthesis; on-line synthesis; parameter space; parametric approach; periodic glottal source; space of area ratios; speaker voice definition; vocal tract parameters; voice analysis; voice conversation; voice conversions; voice properties; voice resynthesis; white noise;
Conference_Titel :
State of the Art in Speech Synthesis (Ref. No. 2000/058), IEE Seminar on
Conference_Location :
London
DOI :
10.1049/ic:20000332