Abstract :
The existing voice conversion (VC) systems, those based on Gaussian mixture models (GMM), bring the problems of over smoothing of GMM mapping. With an aim towards resolving these problems, this paper provides a method on Acoustical Universal Structure (ASU) that can be applied to voice conversion based on GMM. Our contributions include: 1) speech transformation and representation using adaptive interpolation of weighted-spectrum (STRAIGHT) model is taken which allows flexible manipulation of speech parameters such as pitch, vocal tract length, and speaking rate while maintaining high reproduction quality; 2) The advantage of the paper is attributed to the introduction of the predictable spectrum, the ASU, in this paper, is introduced to form the mapping relationship between the source speaker and target speaker. 3) In the training phase, the feedback strategy is adopted, which guarantee the smooth translation of spectral parameters between frames. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of speech quality, conversion accuracy and naturalness for speaker individuality from the objective and subjective tests.
Keywords :
Gaussian processes; acoustic signal processing; interpolation; signal representation; spectral analysis; speech processing; ASU; GMM mapping; Gaussian mixture models; STRAIGHT model; VC systems; acoustical universal structure; feedback strategy; pitch; speaking rate; speech quality; speech representation; speech transformation; training phase; vocal tract length; voice conversion system; weighted-spectrum model adaptive interpolation; Cepstrum; Databases; Hidden Markov models; Prediction algorithms; Speech; Training; Vectors;