Title :
Statistical prosodic modeling: from corpus design to parameter estimation
Author :
Bellegarda, Jerome R. ; Silverman, K.E.A. ; Lenzo, Kevin E A ; Anderson, Victoria
Author_Institution :
Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA
fDate :
1/1/2001 12:00:00 AM
Abstract :
The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts, each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contexts, reiterant speech, function word sequences, and continuous speech. This paper focuses on the use of the Victoria corpus in the statistical estimation of duration and pitch models for Apple´s next-generation text-to-speech system in Macintosh OS X. Duration modeling relies primarily on the subcorpus of prosodic contexts, which is instrumental to uncover empirical evidence in favor of a piecewise linear transformation in the well-known sums-of-products approach. Pitch modeling relies primarily on the subcorpus of reiterant speech, which makes possible the optimization of superpositional pitch models with more accurate underlying smooth contours. Experimental results illustrate the improved prosodic representation resulting from these new duration and pitch models
Keywords :
parameter estimation; speech processing; speech synthesis; statistical analysis; Apple Computer; Macintosh OS X; Victoria corpus; continuous speech; corpus design; experimental results; formal multivariate prosodic models; function word sequences; parameter estimation; piecewise linear transformation; pitch duration estimation; pitch modeling; polyphones; prosodic contexts; reiterant speech subcorpus; smooth contours; speech corpora; speech synthesis research and development; statistical estimation; statistical prosodic modeling; sums-of-products approach; superpositional pitch models; text-to-speech system; Context modeling; Databases; Instruments; Large-scale systems; Parameter estimation; Piecewise linear techniques; Research and development; Speech recognition; Speech synthesis; Terrorism;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on