Title :
Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis
Author :
Zhen-Hua Ling ; Li Deng ; Dong Yu
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
This paper presents a new spectral modeling method for statistical parametric speech synthesis. In contrast to the conventional methods in which high-level spectral parameters, such as mel-cepstra or line spectral pairs, are adopted as the features for hidden Markov model (HMM) based parametric speech synthesis, our new method directly models the distribution of the lower-level, un-transformed or raw spectral envelopes. Instead of using single Gaussian distributions, we adopt restricted Boltzmann machines (RBM) to represent the distribution of the spectral envelopes at each HMM state. We anticipate these will give superior performance in modeling the joint distribution of high-dimensional stochastic vectors. The spectral parameters are derived from the spectral envelope corresponding to the estimated mode of each context-dependent RBM and act as the Gaussian mean vector in the parameter generation procedure at synthesis time. Our experimental results show that the RBM is able to model the distribution of the spectral envelopes with better accuracy and generalization ability than the Gaussian mixture model. As a result, our proposed method can significantly improve the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra.
Keywords :
Boltzmann machines; hidden Markov models; spectral analysis; speech synthesis; Gaussian mean vector; Gaussian mixture model; HMM; context-dependent RBM; hidden Markov model; high-dimensional stochastic vector joint distribution; high-level spectral parameters; line spectral pairs; mel-cepstra; parameter generation procedure; restricted Boltzmann machines; single Gaussian distributions; spectral envelope modelling method; statistical parametric speech synthesis; Context modeling; Gaussian distribution; Hidden Markov models; Speech; Speech synthesis; Training; Vectors; Speech synthesis; hidden Markov model; restricted Boltzmann machine; spectral envelope;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639187