Predicting Spectral and Prosodic Parameters for Unit Selection in Speech Synthesis

Author

Dong, Minghui ; Li, Haizhou

Author_Institution

Inst. for Infocomm Res. (I2R), Singapore, Singapore

fYear

2008

fDate

16-19 Dec. 2008

Firstpage

1

Lastpage

4

Abstract

We usually build a prosody model to predict the prosodic parameters, which will be used as part of the criteria for unit selection. Spectral appropriateness of units is usually ensured by using identities of context units, which are linguistic symbols. With looking into the spectral properties of the actual signal, the spectral mismatches are often perceived in the synthetic speech. In this paper, we propose to use MFCC as spectral parameters in addition to the prosodic parameters. By introducing the spectral parameters into the criteria for unit selection, the appropriateness of units can determined by statistical models. Thus the possibility of abnormal spectral mismatches between the concatenated units can be reduced. Experiments show that the approach helps to improve the quality of synthetic speech.

Keywords

speech synthesis; MFCC; speech synthesis; unit selection; Concatenated codes; Costs; Current measurement; Databases; Hidden Markov models; Measurement units; Mel frequency cepstral coefficient; Natural languages; Predictive models; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on

Conference_Location

Kunming

Print_ISBN

978-1-4244-2942-4

Electronic_ISBN

978-1-4244-2943-1

Type

conf

DOI

10.1109/CHINSL.2008.ECP.45

Filename

4730299