DocumentCode
1416872
Title
Statistical prosodic modeling: from corpus design to parameter estimation
Author
Bellegarda, Jerome R. ; Silverman, K.E.A. ; Lenzo, Kevin E A ; Anderson, Victoria
Author_Institution
Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA
Volume
9
Issue
1
fYear
2001
fDate
1/1/2001 12:00:00 AM
Firstpage
52
Lastpage
66
Abstract
The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts, each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contexts, reiterant speech, function word sequences, and continuous speech. This paper focuses on the use of the Victoria corpus in the statistical estimation of duration and pitch models for Apple´s next-generation text-to-speech system in Macintosh OS X. Duration modeling relies primarily on the subcorpus of prosodic contexts, which is instrumental to uncover empirical evidence in favor of a piecewise linear transformation in the well-known sums-of-products approach. Pitch modeling relies primarily on the subcorpus of reiterant speech, which makes possible the optimization of superpositional pitch models with more accurate underlying smooth contours. Experimental results illustrate the improved prosodic representation resulting from these new duration and pitch models
Keywords
parameter estimation; speech processing; speech synthesis; statistical analysis; Apple Computer; Macintosh OS X; Victoria corpus; continuous speech; corpus design; experimental results; formal multivariate prosodic models; function word sequences; parameter estimation; piecewise linear transformation; pitch duration estimation; pitch modeling; polyphones; prosodic contexts; reiterant speech subcorpus; smooth contours; speech corpora; speech synthesis research and development; statistical estimation; statistical prosodic modeling; sums-of-products approach; superpositional pitch models; text-to-speech system; Context modeling; Databases; Instruments; Large-scale systems; Parameter estimation; Piecewise linear techniques; Research and development; Speech recognition; Speech synthesis; Terrorism;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/89.890071
Filename
890071
Link To Document