DocumentCode
1466087
Title
Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
Author
Türk, Oytun ; Schröder, Marc
Author_Institution
Sensory, Inc., Portland, OR, USA
Volume
18
Issue
5
fYear
2010
fDate
7/1/2010 12:00:00 AM
Firstpage
965
Lastpage
973
Abstract
Generating expressive synthetic voices requires carefully designed databases that contain sufficient amount of expressive speech material. This paper investigates voice conversion and modification techniques to reduce database collection and processing efforts while maintaining acceptable quality and naturalness. In a factorial design, we study the relative contributions of voice quality and prosody as well as the amount of distortions introduced by the respective signal manipulation steps. The unit selection engine in our open source and modular text-to-speech (TTS) framework MARY is extended with voice quality transformation using either GMM-based prediction or vocal tract copy resynthesis. These algorithms are then cross-combined with various prosody copy resynthesis methods. The overall expressive speech generation process functions as a postprocessing step on TTS outputs to transform neutral synthetic speech into aggressive, cheerful, or depressed speech. Cross-combinations of voice quality and prosody transformation algorithms are compared in listening tests for perceived expressive style and quality. The results show that there is a tradeoff between identification and naturalness. Combined modeling of both voice quality and prosody leads to the best identification scores at the expense of lowest naturalness ratings. The fine detail of both voice quality and prosody, as preserved by the copy synthesis, did contribute to a better identification as compared to the approximate models.
Keywords
speech synthesis; copy resynthesis techniques; database collection; expressive speech synthesis; voice conversion; voice quality; Expressive speech synthesis; prosody; voice conversion; voice quality transformation;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2010.2041113
Filename
5444914
Link To Document