Modeling pronunciation variation for spontaneous speech synthesis

Author

Werner, Steffen ; Wolff, Matthias ; Eichner, Matthias ; Hoffinann, R.

Author_Institution

Lab. of Acoust. & Speech Commun., Dresden Univ. of Technol., Germany

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

Integration of pronunciation modeling into speech synthesis makes synthetic speech more natural and colloquial. Pronunciation variation as one observable effect in spontaneous speech is a step towards spontaneous speech synthesis. In the previous works (see Proc. ICASSP, vol.1, p.417-20, Orlando, FL, USA, 2002 and Proc. ICASSP, Hong Kong, PR China, Apr. 2003) we introduced different duration control methods in speech synthesis. These methods are based on the observation that words, which are very likely to occur in a given context are pronounced faster and less accurately than improbable ones (see D. Jurafsky et al., Proc. ICASSP, vol.2, p.801-4, Salt Lake City, USA, 2001). Therefore we use the probability of a word in its context to either control directly the local speaking rate, or to select appropriate pronunciation variants in order to change the local speaking rate. Extending these methods by a pronunciation sequence model, we involve knowledge about how well two subsequent variants fit together. Using this proposed algorithm we could further improve the natural and colloquial listening impressions.

Keywords

probability; speech synthesis; stochastic processes; colloquial synthetic speech; duration control speech synthesis; local speaking rate; multiple duration control methods; natural synthetic speech; pronunciation lexicon; pronunciation sequence model; pronunciation variation; spoken word probability; spontaneous speech synthesis; stochastic Markov graph; variant sequence model; word boundary effects; Acoustics; Communication system control; Laboratories; Lattices; Natural languages; Oral communication; Probability; Speech synthesis; Stochastic processes; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326075

Filename

1326075