Improving segmental GMM based voice conversion method with target frame selection

Author

Hung-Yan Gu ; Sung-Fung Tsai

Author_Institution

Nat. Taiwan Univ. of Sci. & Technol., Taipei, Taiwan

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

483

Lastpage

487

Abstract

In this paper, the voice conversion method based on segmental Gaussian mixture models (GMMs) is further improved by adding the module of target frame selection (TFS). Segmental GMMs are meant to replace a single GMM of a large number of mixture components with several voice-content specific GMMs each consisting of much fewer mixture components. In addition, TFS is used to find a frame, of spectral features near to the mapped feature vector, from the target-speaker frame pool corresponding to the segment class as the input frame belongs to. Both ideas are intended to alleviate the problem that the converted spectral envelopes are often over smoothed. To evaluate the performance of the two ideas mentioned, three voice conversion systems are constructed, and used to conduct listening tests. The results of the tests show that the system using the two ideas together can obtain much improved voice quality. In addition, the measured variance ratio (VR) values show that the system with the two ideas adopted also obtains the highest VR value.

Keywords

Gaussian processes; mixture models; speech processing; TFS; VR values; mapped feature vector; measured variance ratio; segmental GMM; segmental Gaussian mixture models; target frame selection; target-speaker frame pool; voice conversion method; voice conversion systems; voice quality; Cepstral analysis; Speech; Speech synthesis; Standards; Timbre; Vectors; GMM; discrete cepstral coefficient; frame selection; variance ratio; voice conversion;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936633

Filename

6936633