DocumentCode :
2365498
Title :
Frequency warping for speaker adaption of text-to-speech synthesis
Author :
Weixun Gao ; Qiying Cao
Author_Institution :
Sch. of Inf. Sci. & Technol., Donghua Univeristy, Shanghai, China
fYear :
2010
fDate :
26-29 Sept. 2010
Firstpage :
307
Lastpage :
310
Abstract :
Vocal tract length normalization (VTLN) is generally used in speech recognition for removing individual speaker characteristics. In this paper, we employ VTLN to speaker adaptation of speech synthesis. We propose a new frequency warping approach to reduce the spectrum distance between source and target speakers. The frequency warping function is based on a bilinear function and the warping factor is dynamically generated frame-by-frame. The warped spectra of source speaker are then converted to LSPs to train hidden Markov models (HMM). HMMs are further adapted by maximum likelihood linear regression (MLLR) with target speaker´s data. The experimental results show that our frequency warping approach can make the warped spectra of source speaker closer to target speaker and the resultant adapted HMMs have a better performance than the HMMs trained with unwarped spectra in term of voice naturalness and speaker similarity.
Keywords :
hidden Markov models; regression analysis; speech recognition; speech synthesis; bilinear function; frequency warping; hidden Markov models; maximum likelihood linear regression; speaker adaption; speaker similarity; spectrum distance; speech recognition; text-to-speech synthesis; vocal tract length normalization; voice naturalness; TTS; frequency warping; speaker adaptation;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Wireless, Mobile and Multimedia Networks (ICWMNN 2010), IET 3rd International Conference on
Conference_Location :
Beijing
Type :
conf
DOI :
10.1049/cp.2010.0677
Filename :
5703015
Link To Document :
بازگشت