DocumentCode
3752082
Title
A probabilistic interpretation for artificial neural network-based voice conversion
Author
Hsin-Te Hwang;Yu Tsao;Hsin-Min Wang;Yih-Ru Wang;Sin-Horng Chen
Author_Institution
Dept. of Electrical and Computer Engineering, National Chiao Tung University, Hsinchu, Taiwan
fYear
2015
Firstpage
552
Lastpage
558
Abstract
Voice conversion (VC) using artificial neural networks (ANNs) has shown its capability to produce better sound quality of the converted speech than that using Gaussian mixture model (GMM). Although ANN-based VC works reasonably well, there is still room for further improvement. One of the promising ways is to adopt the successful techniques in statistical model-based parameter generation (SMPG), such as trajectory-based mapping approaches that are originally designed for GMM-based VC and hidden Markov model (HMM)-based speech synthesis. This study presents a probabilistic interpretation for ANN-based VC. In this way, ANN-based VC can easily incorporate the successful techniques in SMPG. Experimental results demonstrate that the performance of ANN-based VC can be effectively improved by two trajectory-based mapping techniques (maximum likelihood parameter generation (MLPG) algorithm and maximum likelihood-based trajectory mapping considering global variance (referred to as MLGV)), compared to the conventional ANN-based VC with frame-based mapping and the GMM-based VC with the MLPG algorithm. Moreover, ANN-based VC with the trajectory-based mapping techniques can achieve comparable performance when compared to the state-of-the-art GMM-based VC with the MLGV algorithm.
Keywords
"Artificial neural networks","Hidden Markov models","Speech","Linear programming","Training","Acoustics"
Publisher
ieee
Conference_Titel
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type
conf
DOI
10.1109/APSIPA.2015.7415330
Filename
7415330
Link To Document