Title :
Pitch transformation in neural network based voice conversion
Author :
Feng-Long Xie ; Yao Qian ; Soong, Frank K. ; Haifeng Li
Author_Institution :
Harbin Inst. of Technol., Harbin, China
Abstract :
In voice conversion task, prosody conversion especially pitch conversion is a very challenging research topic because of the discontinuity property of pitch. Conventionally pitch conversion is always achieved by adjusting the mean and variance of the source pitch distribution to the target pitch distribution. This method removes most of the detailed information of the speaker´s prosody and only maintains the global F0 contour. In this paper, we propose a neural network based pitch conversion system which converts F0 and spectral features all together frame by frame. Experimental results show that neural network based pitch conversion can significantly reduce the Unvoiced/Voiced error and RMSE of F0 between converted pitch and target pitch compared with the conventional Gaussian normalized transformation method. Wavelet decomposition for F0 can further improve the performance of voice conversion.
Keywords :
neural nets; speech processing; statistical analysis; wavelet transforms; Gaussian normalized transformation method; RMSE; global F0 contour; mean; neural network based voice conversion; pitch conversion; pitch discontinuity property; pitch distribution; pitch transformation; prosody conversion; root mean square error; spectral feature; variance; wavelet decomposition; Artificial neural networks; Context; Speech; Training; Vectors; Wavelet transforms; neural network; pitch; voice conversion;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936599