Multi-Layer F0 Modeling for HMM-Based Speech Synthesis

Author

Wang, Cheng-Cheng ; Ling, Zhen-Hua ; Zhang, Bu-Fan ; Dai, Li-Rong

Author_Institution

iFlytek Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China

fYear

2008

fDate

16-19 Dec. 2008

Firstpage

1

Lastpage

4

Abstract

This paper proposes a two-layer fundamental frequency (FO) modeling method for HMM-based parametric speech synthesis. The FO models are trained for each context- dependent phoneme in the conventional HMM-based speech synthesis system. Considering the super-segmental characteristics of FO features, an explicit syllable-layer FO model is introduced in this paper. At synthesis stage, the FO contour is generated by maximizing the combined likelihood functions of the phone-layer and syllable-layer FO models. The objective and subjective evaluation results in our experiments show that the proposed multi-layer FO modeling method can improve the performance of FO prediction for emotional speech synthesis.

Keywords

hidden Markov models; maximum likelihood estimation; speech synthesis; HMM-based speech synthesis; maximum combined likelihood functions; multi-layer modeling; two-layer fundamental frequency modeling method; Context modeling; Frequency synthesizers; Hidden Markov models; Predictive models; Probability distribution; Spatial databases; Speech analysis; Speech recognition; Speech synthesis; Stress;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on

Conference_Location

Kunming

Print_ISBN

978-1-4244-2942-4

Electronic_ISBN

978-1-4244-2943-1

Type

conf

DOI

10.1109/CHINSL.2008.ECP.44

Filename

4730298