Title :
Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model
Author :
Phu Nguyen, Binh ; Akagi, Masato
Author_Institution :
Sch. of Inf. Sci., Japan Adv. Inst. of Sci. & Technol., Nomi
Abstract :
In state-of-the-art voice conversion systems, GMM-based voice conversion methods are regarded as some of the best systems. However, the quality of converted speech is still far from natural. There are three main reasons for the degradation of the quality of converted speech: (i) modeling the distribution of acoustic features in voice conversion often uses unstable frames, which degrades the precision of GMM parameters (ii) the transformation function may generate discontinuous features if frames are processed independently (iii) over-smooth effect occurs in each converted frame. This paper presents a new spectral voice conversion method to deal with the two first draw-backs of standard spectral modification methods, insufficient precision of GMM parameters and insufficient smoothness of the converted spectra between frames. A speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to effectively model the spectral evolution. For improvement of estimation of GMM parameters, we use phoneme-based features of event targets as spectral vectors in training procedure to take into account relations between spectral parameters in each phoneme, and to avoid using spectral parameters in transition parts. For enhancement of the continuity of speech spectra, we only need to convert event targets, instead of converting source features to target features frame by frame, and the smoothness of converted speech is ensured by the shape of the event functions. Experimental results show that our proposed spectral voice conversion method improves both the speech quality and the speaker individuality of converted speech.
Keywords :
Gaussian processes; speech enhancement; Gaussian mixture model; acoustic features; over-smooth effect; phoneme-based spectral voice conversion; speech analysis technique; speech spectra enhancement; standard spectral modification methods; temporal decomposition; Degradation; Electronic mail; Hidden Markov models; Information science; Loudspeakers; Parameter estimation; Shape; Speech analysis; Speech enhancement; Speech processing; Gaussian mixture model (GMM); spectral voice conversion; temporal decomposition;
Conference_Titel :
Communications and Electronics, 2008. ICCE 2008. Second International Conference on
Conference_Location :
Hoi an
Print_ISBN :
978-1-4244-2425-2
Electronic_ISBN :
978-1-4244-2426-9
DOI :
10.1109/CCE.2008.4578962