Voice characteristics conversion for HMM-based speech synthesis system

Author

Masuko, Takashi ; Tokuda, Keiichi ; Kobayashi, Takao ; Imai, Satoshi

Author_Institution

Precision & Intelligence Lab., Tokyo Inst. of Technol., Yokohama, Japan

Volume

3

fYear

1997

fDate

21-24 Apr 1997

Firstpage

1611

Abstract

We describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing the HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied the maximum a posteriori estimation and vector field smoothing (MAP/VFS) algorithm to the phoneme HMMs. Using 5 or 8 sentences as adaptation data, speech samples synthesized from a set of adapted tied triphone HMMs, which have approximately 2,000 distributions, are judged to be closer to the target speaker by 79.7% or 90.6%, respectively, in an ABX listening test

Keywords

hidden Markov models; maximum likelihood estimation; smoothing methods; speech processing; speech synthesis; ABX listening test; HMM based speech synthesis system; HMM parameters; MAP/VFS algorithm; adaptation data; adapted tied triphone HMM; distributions; maximum a posteriori estimation; phoneme HMM; sentences; speech samples; speech units; synthesized speech; target speaker; text to speech synthesis system; vector field smoothing; voice characteristics conversion; Cepstral analysis; Computer science; Data analysis; Electronic mail; Hidden Markov models; Laboratories; Spatial databases; Speech analysis; Speech synthesis; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location

Munich

ISSN

1520-6149

Print_ISBN

0-8186-7919-0

Type

conf

DOI

10.1109/ICASSP.1997.598807

Filename

598807