DocumentCode
67900
Title
Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling
Author
Erro, Daniel ; Navas, Eva ; Hernaez, Inma
Author_Institution
Aholab Signal Process. Lab., Univ. of the Basque Country (UPV/EHU), Bilbao, Spain
Volume
21
Issue
3
fYear
2013
fDate
Mar-13
Firstpage
556
Lastpage
566
Abstract
Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These methods modify the frequency axis of the source spectrum in such manner that some significant parts of it, usually the formants, are moved towards their image in the target speaker´s spectrum. Amplitude scaling is then applied to compensate for the differences between warped source spectra and target spectra. This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency warping functions are used. Introducing this constraint allows for the conversion error to be described in the cepstral domain and to minimize it with respect to the parameters of the transformation through an iterative algorithm, even when multiple overlapping conversion classes are considered. The paper explores the advantages and limitations of this approach when applied to a cepstral representation of speech. We show that it achieves significant improvements in quality with respect to traditional methods based on Gaussian mixture models, with no loss in average conversion accuracy. Despite its relative simplicity, it achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.
Keywords
Gaussian processes; cepstral analysis; iterative methods; speech processing; Gaussian mixture models; amplitude scaling; bilinear frequency warping; cepstral domain; cepstral representation; conversion error; dynamic features; frequency axis; global variance; iterative algorithm; parametric voice conversion; source spectrum; speech quality; target speaker spectrum; target spectra; warped source spectra; Cepstral analysis; Frequency conversion; Hidden Markov models; Speech; Synthesizers; Gaussian mixture model; Voice conversion; amplitude scaling; bilinear function; frequency warping;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2012.2227735
Filename
6353545
Link To Document