DocumentCode
2066754
Title
Synthesis of emotional speech using RP-PSOLA
Author
Vine, Daniel S G ; Sahandi, Reza
fYear
2000
fDate
2000
Firstpage
42583
Lastpage
42588
Abstract
Whilst TD-PSOLA remains an adequate solution for neutral speech synthesis, it is less suitable for emotional speech styles, which require more extreme pitch manipulation. By reducing the extent of the necessary pitch manipulation, distortions and artefacts introduced by TD-PSOLA could potentially be lessened. To accomplish this, a method for recording concatenative units with f0 values similar to the target intonation has been devised. This technique, termed reference pitch prompted recording, involves a speaker recording concatenative units at a set pitch. The speaker is guided by a `reference pitch prompt´ (RPP), which is a monotonic, hummed note. In RP-PSOLA (reference pitch-PSOLA) synthesis, RPP-recorded units such as syllables are concatenated and an intonation contour applied using TD-PSOLA. RP-PSOLA can be extended so that several versions of each syllable are recorded, each at a different pitch. In this synthesis technique, termed multiple pitch RP-PSOLA, syllables are selected from an inventory to approximate to the target f0 contour and concatenated. This paper compares the RP-PSOLA and multiple pitch RP-PSOLA synthesis methods in terms of the perceived distortion in emotional synthetic sentences, via a listening experiment. The results showed that multiple pitch RP-PSOLA is perceived to produce marginally less distorted synthetic speech than RP-PSOLA overall
fLanguage
English
Publisher
iet
Conference_Titel
State of the Art in Speech Synthesis (Ref. No. 2000/058), IEE Seminar on
Conference_Location
London
Type
conf
DOI
10.1049/ic:20000325
Filename
846964
Link To Document