DocumentCode :
591775
Title :
Voice conversion using Bayesian mixture of Probabilistic Linear Regressions and dynamic kernel features
Author :
Na Li ; Yu Qiao
Author_Institution :
Northwestern Polytech. Univ., Xi´an, China
fYear :
2012
fDate :
5-8 Dec. 2012
Firstpage :
69
Lastpage :
73
Abstract :
Voice conversion can be formulated as finding a mapping function which transforms the features of a source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion techniques [1, 2] have been widely used in voice conversion due to its effectiveness and efficiency. In a recent work [3], we generalized GMM-based mapping to Mixture of Probabilistic Linear Regressions (MPLR). But both GMM based mapping and MPLR are subjected to overfitting problem especially when the training utterances are sparse,and both ignore the inherent time-dependency among speech features. This paper addresses this problem by introducing dynamic kernel features and conducting Bayesian analysis for MPLR. The dynamic kernel features are calculated as kernel transformations of current, previous and next frames, which can model both the nonlinearities and dynamics in the features. We further develop Maximum a Posterior (MAP) inference to alleviate the overfitting problem by introducing prior on the parameters of kernel transformation. Our experimental results exhibit that the proposed methods achieve better performance compared to the MPLR based model.
Keywords :
Bayes methods; inference mechanisms; maximum likelihood estimation; regression analysis; speech synthesis; Bayesian analysis; Bayesian mixture; GMM-based mapping; Gaussian mixture model-based conversion techniques; MPLR; dynamic kernel features; kernel transformation; mapping function; maximum a posterior inference; mixture of probabilistic linear regression; overfitting problem; source speaker features; sparse training utterances; speech features; target speaker; voice conversion; Bayesian methods; Estimation; Kernel; Linear regression; Probabilistic logic; Training; Vectors; Bayesian inference; dynamic kernel features; mixture of probabilistic linear regressions; voice conversion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
Conference_Location :
Kowloon
Print_ISBN :
978-1-4673-2506-6
Electronic_ISBN :
978-1-4673-2505-9
Type :
conf
DOI :
10.1109/ISCSLP.2012.6423521
Filename :
6423521
Link To Document :
بازگشت