DocumentCode :
178936
Title :
Spectral modeling using neural autoregressive distribution estimators for statistical parametric speech synthesis
Author :
Xiang Yin ; Zhen-Hua Ling ; Li-Rong Dai
Author_Institution :
Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
3824
Lastpage :
3828
Abstract :
This paper describes a new approach which utilizes neural autoregressive distribution estimators (NADE) for the spectral modeling in statistical parametric speech synthesis. In order to alleviate the over-smoothing effect on the generated spectral structures, a restricted Boltzmann machine (RBM) modeling method has been proposed in our previous work, where the RBM is adopted to represent the joint distribution of high-dimensional and physically meaningful spectral envelopes. However, the RBM can not provide a tractable partition function even in a moderate size. In this paper, we introduce NADE to model the distribution of mel-cepstra and spectral envelopes at each HMM state considering its simplicity in evaluating the probability of given observations. At the stage of synthesis, the spectral parameters derived from the mode of each context-dependent NADE are used to replace the Gaussian mean vector in the parameter generation process. Experimental results show that the NADE is able to model the distribution of the spectral features with better accuracy than the RBM model. Furthermore, our proposed method improves the naturalness of the conventional HMM-based speech synthesis system using mel-cepstra significantly and outperforms the RBM-based spectral modeling.
Keywords :
Boltzmann machines; autoregressive processes; cepstral analysis; hidden Markov models; speech synthesis; HMM state; NADE; RBM; generated spectral structures; mel-cepstra envelopes; neural autoregressive distribution estimators; over-smoothing effect; restricted Boltzmann machine; spectral envelopes; spectral features; spectral modeling; statistical parametric speech synthesis; Computational modeling; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Training; Vectors; Speech synthesis; hidden Markov model; neural autoregressive distribution estimator; restricted Boltzmann machine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854317
Filename :
6854317
Link To Document :
بازگشت