• DocumentCode
    67900
  • Title

    Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling

  • Author

    Erro, Daniel ; Navas, Eva ; Hernaez, Inma

  • Author_Institution
    Aholab Signal Process. Lab., Univ. of the Basque Country (UPV/EHU), Bilbao, Spain
  • Volume
    21
  • Issue
    3
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    556
  • Lastpage
    566
  • Abstract
    Voice conversion methods based on frequency warping followed by amplitude scaling have been recently proposed. These methods modify the frequency axis of the source spectrum in such manner that some significant parts of it, usually the formants, are moved towards their image in the target speaker´s spectrum. Amplitude scaling is then applied to compensate for the differences between warped source spectra and target spectra. This article presents a fully parametric formulation of a frequency warping plus amplitude scaling method in which bilinear frequency warping functions are used. Introducing this constraint allows for the conversion error to be described in the cepstral domain and to minimize it with respect to the parameters of the transformation through an iterative algorithm, even when multiple overlapping conversion classes are considered. The paper explores the advantages and limitations of this approach when applied to a cepstral representation of speech. We show that it achieves significant improvements in quality with respect to traditional methods based on Gaussian mixture models, with no loss in average conversion accuracy. Despite its relative simplicity, it achieves similar performance scores to state-of-the-art statistical methods involving dynamic features and global variance.
  • Keywords
    Gaussian processes; cepstral analysis; iterative methods; speech processing; Gaussian mixture models; amplitude scaling; bilinear frequency warping; cepstral domain; cepstral representation; conversion error; dynamic features; frequency axis; global variance; iterative algorithm; parametric voice conversion; source spectrum; speech quality; target speaker spectrum; target spectra; warped source spectra; Cepstral analysis; Frequency conversion; Hidden Markov models; Speech; Synthesizers; Gaussian mixture model; Voice conversion; amplitude scaling; bilinear function; frequency warping;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2227735
  • Filename
    6353545