• DocumentCode
    2301733
  • Title

    Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model

  • Author

    Phu Nguyen, Binh ; Akagi, Masato

  • Author_Institution
    Sch. of Inf. Sci., Japan Adv. Inst. of Sci. & Technol., Nomi
  • fYear
    2008
  • fDate
    4-6 June 2008
  • Firstpage
    224
  • Lastpage
    229
  • Abstract
    In state-of-the-art voice conversion systems, GMM-based voice conversion methods are regarded as some of the best systems. However, the quality of converted speech is still far from natural. There are three main reasons for the degradation of the quality of converted speech: (i) modeling the distribution of acoustic features in voice conversion often uses unstable frames, which degrades the precision of GMM parameters (ii) the transformation function may generate discontinuous features if frames are processed independently (iii) over-smooth effect occurs in each converted frame. This paper presents a new spectral voice conversion method to deal with the two first draw-backs of standard spectral modification methods, insufficient precision of GMM parameters and insufficient smoothness of the converted spectra between frames. A speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to effectively model the spectral evolution. For improvement of estimation of GMM parameters, we use phoneme-based features of event targets as spectral vectors in training procedure to take into account relations between spectral parameters in each phoneme, and to avoid using spectral parameters in transition parts. For enhancement of the continuity of speech spectra, we only need to convert event targets, instead of converting source features to target features frame by frame, and the smoothness of converted speech is ensured by the shape of the event functions. Experimental results show that our proposed spectral voice conversion method improves both the speech quality and the speaker individuality of converted speech.
  • Keywords
    Gaussian processes; speech enhancement; Gaussian mixture model; acoustic features; over-smooth effect; phoneme-based spectral voice conversion; speech analysis technique; speech spectra enhancement; standard spectral modification methods; temporal decomposition; Degradation; Electronic mail; Hidden Markov models; Information science; Loudspeakers; Parameter estimation; Shape; Speech analysis; Speech enhancement; Speech processing; Gaussian mixture model (GMM); spectral voice conversion; temporal decomposition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Electronics, 2008. ICCE 2008. Second International Conference on
  • Conference_Location
    Hoi an
  • Print_ISBN
    978-1-4244-2425-2
  • Electronic_ISBN
    978-1-4244-2426-9
  • Type

    conf

  • DOI
    10.1109/CCE.2008.4578962
  • Filename
    4578962