Title :
Voice conversion in time-invariant speaker-independent space
Author :
Nakashika, Toru ; Takiguchi, Tetsuya ; Ariki, Yasuo
Author_Institution :
Grad. Sch. of Syst. Inf., Kobe Univ., Kobe, Japan
Abstract :
In this paper, we present a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain time-invariant speaker-independent spaces where voice features are converted more easily than those in an original acoustic feature space. First, we train two CRBMs for a source and target speaker independently using speaker-dependent training data (without the need to parallelize the training data). Then, a small number of parallel data are fed into each CRBM and the high-order features produced by the CRBMs are used to train a concatenating neural network (NN) between the two CRBMs. Finally, the entire network (the two CRBMs and the NN) is fine-tuned using the acoustic parallel data. Through voice-conversion experiments, we confirmed the high performance of our method in terms of objective and subjective evaluations, comparing it with conventional GMM, NN, and speaker-dependent DBN approaches.
Keywords :
Boltzmann machines; learning (artificial intelligence); speech processing; CRBM; acoustic feature space; acoustic parallel data; conditional restricted Boltzmann machines; neural network; speaker-dependent training data; time-invariant speaker-independent spaces; voice conversion; voice features; Acoustics; Artificial neural networks; Data models; Speech; Speech processing; Training data; Vectors; Voice conversion; conditional restricted Boltzmann machine; deep learning; speaker specific features;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6855136