• DocumentCode
    2801358
  • Title

    Statistical parametric speech synthesis based on product of experts

  • Author

    Zen, Heiga ; Gales, Mark J F ; Nankaku, Yoshihiko ; Tokuda, Keiichi

  • Author_Institution
    Cambridge Res. Lab., Toshiba Res. Eur. Ltd., Cambridge, UK
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    4242
  • Lastpage
    4245
  • Abstract
    Multiple-level acoustic models (AMs) are often combined in statistical parametric speech synthesis. Both linear and non-linear functions of the observation sequence are used as features in these AMs. This combination of multiple-level AMs can be expressed as a product of experts (PoE); the likelihoods from the AMs are scaled, multiplied together and then normalized. Currently these multiple-level AMs are individually trained and only combined at the synthesis stage. This paper discusses a more consistent PoE framework where the AMs are jointly trained. A generalization of trajectory HMM training can be used for multiple-level Gaussian AMs based on linear functions. However for the non-linear case this is not possible, so a scheme based on contrastive divergence learning is described. Experimental results show that the proposed technique provides both a mathematically elegant way to train multiple-level AMs and statistically significant improvements in the quality of synthesized speech.
  • Keywords
    acoustic signal processing; hidden Markov models; learning (artificial intelligence); nonlinear functions; speech synthesis; statistical analysis; Gaussian AM; PoE; contrastive divergence learning; linear function; multiple level acoustic models; nonlinear function; product of experts; statistical parametric speech synthesis; trajectory HMM; Computer science; Data mining; Degradation; Europe; Hidden Markov models; Laboratories; Robustness; Speech synthesis; Training data; Vocoders; Statistical parametric speech synthesis; product of experts; trajectory HMM;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495691
  • Filename
    5495691