• DocumentCode
    1416872
  • Title

    Statistical prosodic modeling: from corpus design to parameter estimation

  • Author

    Bellegarda, Jerome R. ; Silverman, K.E.A. ; Lenzo, Kevin E A ; Anderson, Victoria

  • Author_Institution
    Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA
  • Volume
    9
  • Issue
    1
  • fYear
    2001
  • fDate
    1/1/2001 12:00:00 AM
  • Firstpage
    52
  • Lastpage
    66
  • Abstract
    The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts, each designed to cover a specific aspect of speech synthesis: polyphones, prosodic contexts, reiterant speech, function word sequences, and continuous speech. This paper focuses on the use of the Victoria corpus in the statistical estimation of duration and pitch models for Apple´s next-generation text-to-speech system in Macintosh OS X. Duration modeling relies primarily on the subcorpus of prosodic contexts, which is instrumental to uncover empirical evidence in favor of a piecewise linear transformation in the well-known sums-of-products approach. Pitch modeling relies primarily on the subcorpus of reiterant speech, which makes possible the optimization of superpositional pitch models with more accurate underlying smooth contours. Experimental results illustrate the improved prosodic representation resulting from these new duration and pitch models
  • Keywords
    parameter estimation; speech processing; speech synthesis; statistical analysis; Apple Computer; Macintosh OS X; Victoria corpus; continuous speech; corpus design; experimental results; formal multivariate prosodic models; function word sequences; parameter estimation; piecewise linear transformation; pitch duration estimation; pitch modeling; polyphones; prosodic contexts; reiterant speech subcorpus; smooth contours; speech corpora; speech synthesis research and development; statistical estimation; statistical prosodic modeling; sums-of-products approach; superpositional pitch models; text-to-speech system; Context modeling; Databases; Instruments; Large-scale systems; Parameter estimation; Piecewise linear techniques; Research and development; Speech recognition; Speech synthesis; Terrorism;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.890071
  • Filename
    890071