DocumentCode :
3430286
Title :
Conditional restricted Boltzmann machine for voice conversion
Author :
Zhizheng Wu ; Eng Siong Chng ; Haizhou Li
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ. (NTU), Singapore, Singapore
fYear :
2013
fDate :
6-10 July 2013
Firstpage :
104
Lastpage :
108
Abstract :
The conventional statistical-based transformation functions for voice conversion have been shown to suffer over-smoothing and over-fitting problems. The over-smoothing problem arises because of the statistical average during estimating the model parameters for the transformation function. In addition, the large number of parameters in the statistical model cannot be well estimated from the limited parallel training data, which will result in the over-fitting problem. In this work, we investigate a robust transformation function for voice conversion using conditional restricted Boltzmann machine. Conditional restricted Boltzmann machine, which performs linear and non-linear transformations simultaneously, is proposed to learn the relationship between source and target speech. CMU ARCTIC corpus is adopted in the experimental validations. The number of parallel training utterances is varied from 2 to 40. For these different training situations, two objective evaluation measures, mel-cepstral distortion and correlation coefficient, both show that the proposed method outperforms the main stream joint density Gaussian mixture model method consistently.
Keywords :
Boltzmann machines; parameter estimation; speech synthesis; statistical analysis; CMU ARCTIC corpus; Mel-cepstral distortion; conditional restricted Boltzmann machine; correlation coefficient; linear transformations; model parameter estimation; nonlinear transformations; objective evaluation measures; overfitting problems; oversmoothing problems; parallel training data; robust transformation function; speech synthesis; statistical model; statistical-based transformation functions; stream joint density Gaussian mixture model method; target speech processing; voice conversion; Correlation; Distortion measurement; Speech; Speech processing; Training; Training data; Vectors; Speech synthesis; conditional restricted Boltzmann machine; voice conversion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/ChinaSIP.2013.6625307
Filename :
6625307
Link To Document :
بازگشت