DocumentCode :
1389080
Title :
Statistical Text-to-Speech Synthesis Based on Segment-Wise Representation With a Norm Constraint
Author :
Tiomkin, Stas ; Malah, David ; Shechtman, Slava
Author_Institution :
Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel
Volume :
18
Issue :
5
fYear :
2010
fDate :
7/1/2010 12:00:00 AM
Firstpage :
1077
Lastpage :
1082
Abstract :
In statistical HMM-based text-to-speech systems (STTS), speech feature dynamics is modeled by first- and second-order feature frame differences, which, typically, do not satisfactorily represent frame to frame feature dynamics present in natural speech. The reduced dynamics results in over-smoothing of speech features, often sounding as muffled synthesized speech. In this correspondence, we propose a method to enhance a baseline STTS system by introducing a segment-wise model representation with a norm constraint. The segment-wise representation provides additional degrees of freedom in speech feature determination. We exploit these degrees of freedom for increasing the speech feature vector norm to match a norm constraint. As a result, statistically generated speech features are less over-smoothed, resulting in more natural sounding speech, as judged by listening tests.
Keywords :
feature extraction; hidden Markov models; speech synthesis; statistical analysis; HMM; STTS; natural speech; norm constraint; segment-wise representation; speech feature dynamics; statistical text-to-speech synthesis; Segment-wise model representation; speech feature dynamics; statistical TTS; text-to-speech (TTS) synthesis;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2010.2040795
Filename :
5393045
Link To Document :
بازگشت