Title :
A dynamical system model for generating fundamental frequency for speech synthesis
Author :
Ross, Kenneth N. ; Ostendorf, Mari
Author_Institution :
Alphatech Inc., Burlington, MA, USA
fDate :
5/1/1999 12:00:00 AM
Abstract :
Higher quality speech synthesis is required for widespread use of text to-speech (TTS) technology, and prosody is one component of synthesis technology with the greatest need for improvement. This paper describes a new approach to generation of two important cues to prosodic patterns-fundamental frequency (F0) and energy contours-given symbolic prosodic labels and text. Specifically, the approach represents vectors of F0 and energy with a dynamical system model, which allows automatic estimation of parameters from labeled speech. Parameters at different time scales in the model are structured to capture segment, syllable, phrase and discourse level effects based on linguistic research. F0 generation experiments with the dynamical system model show improved synthetic speech quality over the hybrid target/filter approach
Keywords :
speech intelligibility; speech synthesis; statistical analysis; automatic parameter estimation; discourse level effects; dynamical system model; energy; energy contours; experiments; fundamental frequency generation; hybrid target/filter approach; labeled speech; linguistic research; phrase level effects; phrase-level contours; prosodic patterns; regression tree; speech synthesis; statistical approach; syllable; symbolic prosodic labels; synthetic speech quality; text to-speech technology; time scales; vectors; Filters; Frequency synthesizers; Hybrid power systems; Parameter estimation; Power engineering and energy; Predictive models; Regression tree analysis; Speech synthesis; Stress; Text processing;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on