Title :
Voice conversion using conditional restricted Boltzmann machine
Author :
Fengyun Zhu ; Ziye Fan ; Xihong Wu
Author_Institution :
Key Lab. of Machine Perception (Minist. of Educ.), Peking Univ., Beijing, China
Abstract :
In this paper, we proposed a new method for voice conversion using conditional restricted Boltzmann machine (Conditional RBM, CRBM). The joint distribution of source and target acoustic features are modeled by the RBM part of the model. Short-term temporal constraints are introduced by conditioning on contextual frames, say, the past and future frames of the source speaker. In contrast to conventional methods, temporal structure of the data could be modeled without using dynamic features. Objective and subjective experiments were conducted to evaluate the method. Experimental results show that short-term temporal structure could be modeled well by CRBM, and the proposed method outperforms conventional joint density Gaussian mixture models based method significantly.
Keywords :
Boltzmann machines; acoustic signal processing; speech processing; conditional restricted Boltzmann machine; contextual frames; dynamic features; joint distribution; short-term temporal constraints; short-term temporal structure; source acoustic features; source speaker; target acoustic features; voice conversion; Acoustics; Data models; Joints; Speech; Standards; Training; Vectors; Conditional restricted Boltzmann machine; Voice conversion;
Conference_Titel :
Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on
Conference_Location :
Xi´an
Print_ISBN :
978-1-4799-5401-8
DOI :
10.1109/ChinaSIP.2014.6889212