مرکز منطقه ای اطلاع رساني علوم و فناوري - Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis

DocumentCode :

1326337

Title :

Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis

Author :

Yu, Kai ; Young, Steve

Author_Institution :

Eng. Dept., Cambridge Univ., Cambridge, UK

Volume :

Issue :

fYear :

2011

fDate :

7/1/2011 12:00:00 AM

Firstpage :

1071

Lastpage :

1079

Abstract :

The modeling of fundamental frequency, or F0, in HMM-based speech synthesis is a critical factor in delivering speech which is both natural and accurately conveys all of the many nuances of the message. However, F0 modeling is difficult because F0 values are normally considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. F0 is therefore a discontinuous function of time. Multi-space probability distribution HMM (MSDHMM) is a widely used solution to this problem. The MSDHMM essentially uses a joint distribution of discrete voicing labels and the discontinuous F0 observations. However, due to the discontinuity assumption, the MSDHMM provides a rather weak F0 trajectory model. In this paper, F0 is viewed as being a continuous function of time and this is achieved by assuming that F0 can be observed within unvoiced regions as well as voiced regions. This provides a continuous F0 data stream which can be modeled by standard HMMs. Voicing labels are modeled either implicitly or explicitly in order to perform voicing classification and a globally tied distribution (GTD) technique is used to achieve robust F0 estimation. Both objective measures and subjective listening tests demonstrate that continuous F0 modeling yields better synthesized F0 trajectories and significant improvements to the naturalness of synthesized speech compared to using the MSDHMM model.

Keywords :

hidden Markov models; signal classification; speech synthesis; statistical distributions; HMM-based speech synthesis; binary voicing decision; continuous modeling; discrete voicing labels; globally tied distribution technique; multispace probability distribution HMM; statistical parametric speech synthesis; trajectory model; voicing classification; Hidden Markov models; Joints; Probability distribution; Random variables; Speech; Speech synthesis; Trajectory; F0 modeling; hidden Markov model (HMM)-based synthesis; statistical parametric speech synthesis; voicing classification;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2076805

Filename :

5575397

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1326337