DocumentCode
1937178
Title
Expressive synthesis: how crucial is voice quality?
Author
Gobl, Christer ; Bennett, Eva ; Chasaide, Ailbhe Ni
Author_Institution
Centre for Language & Commun. Studies, Trinity Coll., Dublin, Ireland
fYear
2002
fDate
11-13 Sept. 2002
Firstpage
91
Lastpage
94
Abstract
This paper compares the emotive colouring that can be achieved in synthesis by f0 manipulations alone (f0 only) as compared to manipulations of f0 with voice quality (VQ+f0), and asks how crucial large f0 excursions are in signalling strong emotions. Are they overwhelmingly important, with voice quality contributing mainly to finer distinctions for milder affects? Or are voice quality and large f0 differences required for the strong emotions? The ´VQ+f0´ stimuli, of an utterance synthesised using the LF voice source in KLSYN88 with breathy, whispery, lax-creaky, modal, tense and harsh voice qualities (Gobl et al. (2002)), were further manipulated to replicate the f0 differences described in Mozziconacci et al. (1995) for 6 emotions, matched to the most appropriate voice quality. The ´f0 only´ stimuli used the same set of f0 contours, but retained source settings for modal voice. 10 listeners rated the affective colouring of the stimuli on a seven-point scale, in terms of pairs of opposite attributes. For both strong and milder affects the ´VQ+f0´ stimuli achieved much higher ratings than the ´f0 only´ stimuli, which were relatively ineffective. Implications for the synthesis of expressive speech are discussed.
Keywords
speech processing; speech synthesis; LF voice source; emotive colouring; expressive synthesis; voice quality; Educational institutions; Mood; Signal synthesis; Speech synthesis; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN
0-7803-7395-2
Type
conf
DOI
10.1109/WSS.2002.1224380
Filename
1224380
Link To Document