مرکز منطقه ای اطلاع رساني علوم و فناوري - STRAIGHT model for voice conversion based on acoustical universal structure

DocumentCode :

2449978

Title :

STRAIGHT model for voice conversion based on acoustical universal structure

Author :

Gang Xu ; Qi Zhou ; Dong Zhao ; Ding Huang

fYear :

2012

fDate :

16-18 July 2012

Firstpage :

454

Lastpage :

458

Abstract :

The existing voice conversion (VC) systems, those based on Gaussian mixture models (GMM), bring the problems of over smoothing of GMM mapping. With an aim towards resolving these problems, this paper provides a method on Acoustical Universal Structure (ASU) that can be applied to voice conversion based on GMM. Our contributions include: 1) speech transformation and representation using adaptive interpolation of weighted-spectrum (STRAIGHT) model is taken which allows flexible manipulation of speech parameters such as pitch, vocal tract length, and speaking rate while maintaining high reproduction quality; 2) The advantage of the paper is attributed to the introduction of the predictable spectrum, the ASU, in this paper, is introduced to form the mapping relationship between the source speaker and target speaker. 3) In the training phase, the feedback strategy is adopted, which guarantee the smooth translation of spectral parameters between frames. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of speech quality, conversion accuracy and naturalness for speaker individuality from the objective and subjective tests.

Keywords :

Gaussian processes; acoustic signal processing; interpolation; signal representation; spectral analysis; speech processing; ASU; GMM mapping; Gaussian mixture models; STRAIGHT model; VC systems; acoustical universal structure; feedback strategy; pitch; speaking rate; speech quality; speech representation; speech transformation; training phase; vocal tract length; voice conversion system; weighted-spectrum model adaptive interpolation; Cepstrum; Databases; Hidden Markov models; Prediction algorithms; Speech; Training; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Audio, Language and Image Processing (ICALIP), 2012 International Conference on

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0173-2

Type :

conf

DOI :

10.1109/ICALIP.2012.6376660

Filename :

6376660

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2449978