DocumentCode :
1395234
Title :
Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units
Author :
Qian, Yao ; Wu, Zhizheng ; Gao, Boyang ; Soong, Frank K.
Author_Institution :
Microsoft Res. Asia, Beijing, China
Volume :
19
Issue :
6
fYear :
2011
Firstpage :
1702
Lastpage :
1710
Abstract :
The current state-of-the-art hidden Markov model (HMM)-based text-to-speech (TTS) can produce highly intelligible, synthesized speech with decent segmental quality. However, its prosody, especially at phrase or sentence level, still tends to be bland. This blandness is partially due to the fact that the state-based HMM is inadequate in capturing global, hierarchical suprasegmental information in speech signals. In this paper, to improve the TTS prosody, longer units are first explicitly modeled with appropriate parametric distributions. The resultant models are then integrated with the state-based baseline models in generating better prosody by maximizing the joint probability. Experimental results in both Mandarin and English show consistent improvements over our baseline system with only state-based prosody model. The improvements are both objectively measurable and subjectively perceivable.
Keywords :
hidden Markov models; optimisation; probability; speech synthesis; hidden Markov model; joint probability maximization; parametric distribution; prosody generation; speech synthesis; state-based baseline model; text-to-speech prosody; Biological system modeling; Discrete cosine transforms; Hidden Markov models; Joints; Mathematical model; Speech; Trajectory; Discrete cosine transforms (DCTs); speech synthesis; statistical distributions;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2010.2097248
Filename :
5658121
Link To Document :
بازگشت