• DocumentCode
    69705
  • Title

    Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion

  • Author

    Zhizheng Wu ; Virtanen, Tuomas ; Eng Siong Chng ; Haizhou Li

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • Volume
    22
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    1506
  • Lastpage
    1521
  • Abstract
    We propose a nonparametric framework for voice conversion, that is, exemplar-based sparse representation with residual compensation. In this framework, a spectrogram is reconstructed as a weighted linear combination of speech segments, called exemplars, which span multiple consecutive frames. The linear combination weights are constrained to be sparse to avoid over-smoothing, and high-resolution spectra are employed in the exemplars directly without dimensionality reduction to maintain spectral details. In addition, a spectral compression factor and a residual compensation technique are included in the framework to enhance the conversion performances. We conducted experiments on the VOICES database to compare the proposed method with a large set of state-of-the-art baseline methods, including the maximum likelihood Gaussian mixture model (ML-GMM) with dynamic feature constraint and the partial least squares (PLS) regression based methods. The experimental results show that the objective spectral distortion of ML-GMM is reduced from 5.19 dB to 4.92 dB, and both the subjective mean opinion score and the speaker identification rate are increased from 2.49 and 73.50% to 3.15 and 79.50%, respectively, by the proposed method. The results also show the superiority of our method over PLS-based methods. In addition, the subjective listening tests indicate that the naturalness of the converted speech by our proposed method is comparable with that by the ML-GMM method with global variance constraint.
  • Keywords
    compressed sensing; matrix decomposition; signal reconstruction; sparse matrices; speech processing; dynamic feature constraint; exemplar based sparse representation; global variance constraint; high resolution spectra; maximum likelihood Gaussian mixture model; partial least squares regression based methods; residual compensation technique; spectral compression factor; speech segments; subjective listening tests; voice conversion; weighted linear combination; IEEE transactions; Spectrogram; Speech; Speech processing; Training; Training data; Vectors; Exemplar; nonnegative matrix factorization; residual compensation; sparse representation; voice conversion;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2014.2333242
  • Filename
    6843941