DocumentCode
661318
Title
Incorporating global variance in the training phase of GMM-based voice conversion
Author
Hsin-Te Hwang ; Yu Tsao ; Hsin-Min Wang ; Yih-Ru Wang ; Sin-Horng Chen
Author_Institution
Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
fYear
2013
fDate
Oct. 29 2013-Nov. 1 2013
Firstpage
1
Lastpage
6
Abstract
Maximum likelihood-based trajectory mapping considering global variance (MLGV-based trajectory mapping) has been proposed for improving the quality of the converted speech of Gaussian mixture model-based voice conversion (GMM-based VC). Although the quality of the converted speech is significantly improved, the computational cost of the online conversion process is also increased because there is no closed form solution for parameter generation in MLGV-based trajectory mapping, and an iterative process is generally required. To reduce the online computational cost, we propose to incorporate GV in the training phase of GMM-based VC. Then, the conversion process can simply adopt ML-based trajectory mapping (without considering GV in the conversion phase), which has a closed form solution. In this way, it is expected that the quality of the converted speech can be improved without increasing the online computational cost. Our experimental results demonstrate that the proposed method yields a significant improvement in the quality of the converted speech comparing to the conventional GMM-based VC method. Meanwhile, comparing to MLGV-based trajectory mapping, the proposed method provides comparable converted speech quality with reduced computational cost in the conversion process.
Keywords
Gaussian processes; mixture models; speech processing; GMM based VC method; GMM based voice conversion; Gaussian mixture model based voice conversion; ML based trajectory mapping; MLGV based trajectory mapping; converted speech quality; global variance; iterative process; maximum likelihood based trajectory mapping; online computational cost; online conversion process; parameter generation; training phase; Computational efficiency; Joints; Speech; Speech processing; Training; Trajectory; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific
Conference_Location
Kaohsiung
Type
conf
DOI
10.1109/APSIPA.2013.6694179
Filename
6694179
Link To Document