• DocumentCode
    1692835
  • Title

    Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis

  • Author

    Zhen-Hua Ling ; Li Deng ; Dong Yu

  • Author_Institution
    Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2013
  • Firstpage
    7825
  • Lastpage
    7829
  • Abstract
    This paper presents a new spectral modeling method for statistical parametric speech synthesis. In contrast to the conventional methods in which high-level spectral parameters, such as mel-cepstra or line spectral pairs, are adopted as the features for hidden Markov model (HMM) based parametric speech synthesis, our new method directly models the distribution of the lower-level, un-transformed or raw spectral envelopes. Instead of using single Gaussian distributions, we adopt restricted Boltzmann machines (RBM) to represent the distribution of the spectral envelopes at each HMM state. We anticipate these will give superior performance in modeling the joint distribution of high-dimensional stochastic vectors. The spectral parameters are derived from the spectral envelope corresponding to the estimated mode of each context-dependent RBM and act as the Gaussian mean vector in the parameter generation procedure at synthesis time. Our experimental results show that the RBM is able to model the distribution of the spectral envelopes with better accuracy and generalization ability than the Gaussian mixture model. As a result, our proposed method can significantly improve the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra.
  • Keywords
    Boltzmann machines; hidden Markov models; spectral analysis; speech synthesis; Gaussian mean vector; Gaussian mixture model; HMM; context-dependent RBM; hidden Markov model; high-dimensional stochastic vector joint distribution; high-level spectral parameters; line spectral pairs; mel-cepstra; parameter generation procedure; restricted Boltzmann machines; single Gaussian distributions; spectral envelope modelling method; statistical parametric speech synthesis; Context modeling; Gaussian distribution; Hidden Markov models; Speech; Speech synthesis; Training; Vectors; Speech synthesis; hidden Markov model; restricted Boltzmann machine; spectral envelope;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639187
  • Filename
    6639187