Title :
Continuous probabilistic transform for voice conversion
Author :
Stylianou, Yannis ; Cappé, Olivier ; Moulines, Eric
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
fDate :
3/1/1998 12:00:00 AM
Abstract :
Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). Our contribution includes the design of a new methodology for representing the relationship between two sets of spectral envelopes. The proposed method is based on the use of a Gaussian mixture model of the source speaker spectral envelopes. The conversion itself is represented by a continuous parametric function which takes into account the probabilistic classification provided by the mixture model. The parameters of the conversion function are estimated by least squares optimization on the training data. This conversion method is implemented in the context of the HNM (harmonic+noise model) system, which allows high-quality modifications of speech signals. Compared to earlier methods based on vector quantization, the proposed conversion scheme results in a much better match between the converted envelopes and the target envelopes. Evaluation by objective tests and formal listening tests shows that the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods
Keywords :
Gaussian processes; least squares approximations; pattern classification; spectral analysis; speech processing; speech recognition; transforms; Gaussian mixture model; continuous parametric function; continuous probabilistic transform; formal listening tests; harmonic+noise model; high-quality modifications; least squares optimization; objective tests; probabilistic classification; source speaker; spectral envelopes; speech signal; target speaker; voice conversion; Acoustic noise; Humans; Least squares approximation; Loudspeakers; Natural languages; Psychoacoustic models; Speaker recognition; Speech synthesis; Testing; Training data;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on