DocumentCode
2812765
Title
Voice characteristics conversion for HMM-based speech synthesis system
Author
Masuko, Takashi ; Tokuda, Keiichi ; Kobayashi, Takao ; Imai, Satoshi
Author_Institution
Precision & Intelligence Lab., Tokyo Inst. of Technol., Yokohama, Japan
Volume
3
fYear
1997
fDate
21-24 Apr 1997
Firstpage
1611
Abstract
We describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing the HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied the maximum a posteriori estimation and vector field smoothing (MAP/VFS) algorithm to the phoneme HMMs. Using 5 or 8 sentences as adaptation data, speech samples synthesized from a set of adapted tied triphone HMMs, which have approximately 2,000 distributions, are judged to be closer to the target speaker by 79.7% or 90.6%, respectively, in an ABX listening test
Keywords
hidden Markov models; maximum likelihood estimation; smoothing methods; speech processing; speech synthesis; ABX listening test; HMM based speech synthesis system; HMM parameters; MAP/VFS algorithm; adaptation data; adapted tied triphone HMM; distributions; maximum a posteriori estimation; phoneme HMM; sentences; speech samples; speech units; synthesized speech; target speaker; text to speech synthesis system; vector field smoothing; voice characteristics conversion; Cepstral analysis; Computer science; Data analysis; Electronic mail; Hidden Markov models; Laboratories; Spatial databases; Speech analysis; Speech synthesis; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location
Munich
ISSN
1520-6149
Print_ISBN
0-8186-7919-0
Type
conf
DOI
10.1109/ICASSP.1997.598807
Filename
598807
Link To Document