Title :
Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling
Author :
Erro, Daniel ; Navas, Eva ; Hernaez, Inma
Author_Institution :
Aholab Signal Process. Lab., Univ. of the Basque Country (UPV/EHU), Bilbao, Spain
Abstract :
Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These methods modify the frequency axis of the source spectrum in such manner that some significant parts of it, usually the formants, are moved towards their image in the target speaker´s spectrum. Amplitude scaling is then applied to compensate for the differences between warped source spectra and target spectra. This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency warping functions are used. Introducing this constraint allows for the conversion error to be described in the cepstral domain and to minimize it with respect to the parameters of the transformation through an iterative algorithm, even when multiple overlapping conversion classes are considered. The paper explores the advantages and limitations of this approach when applied to a cepstral representation of speech. We show that it achieves significant improvements in quality with respect to traditional methods based on Gaussian mixture models, with no loss in average conversion accuracy. Despite its relative simplicity, it achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.
Keywords :
Gaussian processes; cepstral analysis; iterative methods; speech processing; Gaussian mixture models; amplitude scaling; bilinear frequency warping; cepstral domain; cepstral representation; conversion error; dynamic features; frequency axis; global variance; iterative algorithm; parametric voice conversion; source spectrum; speech quality; target speaker spectrum; target spectra; warped source spectra; Cepstral analysis; Frequency conversion; Hidden Markov models; Speech; Synthesizers; Gaussian mixture model; Voice conversion; amplitude scaling; bilinear function; frequency warping;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2012.2227735