Pronunciation Variant Selection for Spontaneous Speech Synthesis-Listening Effort As a Quality Parameter

Author

Werner, Steffen ; Wolff, Matthias ; Hoffmann, Rüdiger

Author_Institution

Lab. of Acousti. & Speech Commun., Dresden Univ. of Technol.

Volume

1

fYear

2006

fDate

14-19 May 2006

Abstract

In previous works (see for instance S. Werner et al. (2004)) we introduced different duration control methods in speech synthesis. The most outstanding approach is to control the grapheme to phoneme conversion (and thus indirectly control the speaking rate) by selecting (reduced) pronunciation variants according to a pronunciation variant sequence model. Listeners would only accept long synthesized utterances if the listening effort is nearly the same as the one when listening to natural speech. To evaluate the quality of the variant synthesis compared to the canonical one (as the state-of-the-art system), we performed a listening test with two different synthesis systems. The variant synthesis applying a pronunciation variant sequence model achieved a significant lower listening effort and a higher overall rate (MOS) compared to the canonical synthesis. We also show that the listening effort can act as a quality parameter for a speech sample. The rating for the listening effort is correlated with the rating of the naturalness and intelligibility of synthesized speech sample

Keywords

speech synthesis; canonical synthesis; listening effort; listening test; pronunciation variant selection; spontaneous speech synthesis; Acoustics; Communication system control; Control system synthesis; Databases; Laboratories; Natural languages; Oral communication; Performance evaluation; Speech synthesis; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location

Toulouse

ISSN

1520-6149

Print_ISBN

1-4244-0469-X

Type

conf

DOI

10.1109/ICASSP.2006.1660156

Filename

1660156