Title :
Improving segmental GMM based voice conversion method with target frame selection
Author :
Hung-Yan Gu ; Sung-Fung Tsai
Author_Institution :
Nat. Taiwan Univ. of Sci. & Technol., Taipei, Taiwan
Abstract :
In this paper, the voice conversion method based on segmental Gaussian mixture models (GMMs) is further improved by adding the module of target frame selection (TFS). Segmental GMMs are meant to replace a single GMM of a large number of mixture components with several voice-content specific GMMs each consisting of much fewer mixture components. In addition, TFS is used to find a frame, of spectral features near to the mapped feature vector, from the target-speaker frame pool corresponding to the segment class as the input frame belongs to. Both ideas are intended to alleviate the problem that the converted spectral envelopes are often over smoothed. To evaluate the performance of the two ideas mentioned, three voice conversion systems are constructed, and used to conduct listening tests. The results of the tests show that the system using the two ideas together can obtain much improved voice quality. In addition, the measured variance ratio (VR) values show that the system with the two ideas adopted also obtains the highest VR value.
Keywords :
Gaussian processes; mixture models; speech processing; TFS; VR values; mapped feature vector; measured variance ratio; segmental GMM; segmental Gaussian mixture models; target frame selection; target-speaker frame pool; voice conversion method; voice conversion systems; voice quality; Cepstral analysis; Speech; Speech synthesis; Standards; Timbre; Vectors; GMM; discrete cepstral coefficient; frame selection; variance ratio; voice conversion;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936633