Statistical parametric speech synthesis based on product of experts

Author

Zen, Heiga ; Gales, Mark J F ; Nankaku, Yoshihiko ; Tokuda, Keiichi

Author_Institution

Cambridge Res. Lab., Toshiba Res. Eur. Ltd., Cambridge, UK

fYear

2010

fDate

14-19 March 2010

Firstpage

4242

Lastpage

4245

Abstract

Multiple-level acoustic models (AMs) are often combined in statistical parametric speech synthesis. Both linear and non-linear functions of the observation sequence are used as features in these AMs. This combination of multiple-level AMs can be expressed as a product of experts (PoE); the likelihoods from the AMs are scaled, multiplied together and then normalized. Currently these multiple-level AMs are individually trained and only combined at the synthesis stage. This paper discusses a more consistent PoE framework where the AMs are jointly trained. A generalization of trajectory HMM training can be used for multiple-level Gaussian AMs based on linear functions. However for the non-linear case this is not possible, so a scheme based on contrastive divergence learning is described. Experimental results show that the proposed technique provides both a mathematically elegant way to train multiple-level AMs and statistically significant improvements in the quality of synthesized speech.

Keywords

acoustic signal processing; hidden Markov models; learning (artificial intelligence); nonlinear functions; speech synthesis; statistical analysis; Gaussian AM; PoE; contrastive divergence learning; linear function; multiple level acoustic models; nonlinear function; product of experts; statistical parametric speech synthesis; trajectory HMM; Computer science; Data mining; Degradation; Europe; Hidden Markov models; Laboratories; Robustness; Speech synthesis; Training data; Vocoders; Statistical parametric speech synthesis; product of experts; trajectory HMM;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5495691

Filename

5495691