Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling

Author

Erro, Daniel ; Navas, Eva ; Hernaez, Inma

Author_Institution

Aholab Signal Process. Lab., Univ. of the Basque Country (UPV/EHU), Bilbao, Spain

Volume

21

Issue

3

fYear

2013

fDate

Mar-13

Firstpage

556

Lastpage

566

Abstract

Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These methods modify the frequency axis of the source spectrum in such manner that some significant parts of it, usually the formants, are moved towards their image in the target speaker´s spectrum. Amplitude scaling is then applied to compensate for the differences between warped source spectra and target spectra. This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency warping functions are used. Introducing this constraint allows for the conversion error to be described in the cepstral domain and to minimize it with respect to the parameters of the transformation through an iterative algorithm, even when multiple overlapping conversion classes are considered. The paper explores the advantages and limitations of this approach when applied to a cepstral representation of speech. We show that it achieves significant improvements in quality with respect to traditional methods based on Gaussian mixture models, with no loss in average conversion accuracy. Despite its relative simplicity, it achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.

Keywords

Gaussian processes; cepstral analysis; iterative methods; speech processing; Gaussian mixture models; amplitude scaling; bilinear frequency warping; cepstral domain; cepstral representation; conversion error; dynamic features; frequency axis; global variance; iterative algorithm; parametric voice conversion; source spectrum; speech quality; target speaker spectrum; target spectra; warped source spectra; Cepstral analysis; Frequency conversion; Hidden Markov models; Speech; Synthesizers; Gaussian mixture model; Voice conversion; amplitude scaling; bilinear function; frequency warping;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2012.2227735

Filename

6353545