Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion

Author

Zhizheng Wu ; Virtanen, Tuomas ; Eng Siong Chng ; Haizhou Li

Author_Institution

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore

Volume

22

Issue

10

fYear

2014

fDate

Oct. 2014

Firstpage

1506

Lastpage

1521

Abstract

We propose a nonparametric framework for voice conversion, that is, exemplar-based sparse representation with residual compensation. In this framework, a spectrogram is reconstructed as a weighted linear combination of speech segments, called exemplars, which span multiple consecutive frames. The linear combination weights are constrained to be sparse to avoid over-smoothing, and high-resolution spectra are employed in the exemplars directly without dimensionality reduction to maintain spectral details. In addition, a spectral compression factor and a residual compensation technique are included in the framework to enhance the conversion performances. We conducted experiments on the VOICES database to compare the proposed method with a large set of state-of-the-art baseline methods, including the maximum likelihood Gaussian mixture model (ML-GMM) with dynamic feature constraint and the partial least squares (PLS) regression based methods. The experimental results show that the objective spectral distortion of ML-GMM is reduced from 5.19 dB to 4.92 dB, and both the subjective mean opinion score and the speaker identification rate are increased from 2.49 and 73.50% to 3.15 and 79.50%, respectively, by the proposed method. The results also show the superiority of our method over PLS-based methods. In addition, the subjective listening tests indicate that the naturalness of the converted speech by our proposed method is comparable with that by the ML-GMM method with global variance constraint.

Keywords

compressed sensing; matrix decomposition; signal reconstruction; sparse matrices; speech processing; dynamic feature constraint; exemplar based sparse representation; global variance constraint; high resolution spectra; maximum likelihood Gaussian mixture model; partial least squares regression based methods; residual compensation technique; spectral compression factor; speech segments; subjective listening tests; voice conversion; weighted linear combination; IEEE transactions; Spectrogram; Speech; Speech processing; Training; Training data; Vectors; Exemplar; nonnegative matrix factorization; residual compensation; sparse representation; voice conversion;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2333242

Filename

6843941