مرکز منطقه ای اطلاع رساني علوم و فناوري - Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model

DocumentCode :

2301733

Title :

Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model

Author :

Phu Nguyen, Binh ; Akagi, Masato

Author_Institution :

Sch. of Inf. Sci., Japan Adv. Inst. of Sci. & Technol., Nomi

fYear :

2008

fDate :

4-6 June 2008

Firstpage :

224

Lastpage :

229

Abstract :

In state-of-the-art voice conversion systems, GMM-based voice conversion methods are regarded as some of the best systems. However, the quality of converted speech is still far from natural. There are three main reasons for the degradation of the quality of converted speech: (i) modeling the distribution of acoustic features in voice conversion often uses unstable frames, which degrades the precision of GMM parameters (ii) the transformation function may generate discontinuous features if frames are processed independently (iii) over-smooth effect occurs in each converted frame. This paper presents a new spectral voice conversion method to deal with the two first draw-backs of standard spectral modification methods, insufficient precision of GMM parameters and insufficient smoothness of the converted spectra between frames. A speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to effectively model the spectral evolution. For improvement of estimation of GMM parameters, we use phoneme-based features of event targets as spectral vectors in training procedure to take into account relations between spectral parameters in each phoneme, and to avoid using spectral parameters in transition parts. For enhancement of the continuity of speech spectra, we only need to convert event targets, instead of converting source features to target features frame by frame, and the smoothness of converted speech is ensured by the shape of the event functions. Experimental results show that our proposed spectral voice conversion method improves both the speech quality and the speaker individuality of converted speech.

Keywords :

Gaussian processes; speech enhancement; Gaussian mixture model; acoustic features; over-smooth effect; phoneme-based spectral voice conversion; speech analysis technique; speech spectra enhancement; standard spectral modification methods; temporal decomposition; Degradation; Electronic mail; Hidden Markov models; Information science; Loudspeakers; Parameter estimation; Shape; Speech analysis; Speech enhancement; Speech processing; Gaussian mixture model (GMM); spectral voice conversion; temporal decomposition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communications and Electronics, 2008. ICCE 2008. Second International Conference on

Conference_Location :

Hoi an

Print_ISBN :

978-1-4244-2425-2

Electronic_ISBN :

978-1-4244-2426-9

Type :

conf

DOI :

10.1109/CCE.2008.4578962

Filename :

4578962

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2301733