DocumentCode :
312284
Title :
Goethe for prosody
Author :
Rapp, Stefan
Author_Institution :
Inst. fur Maschinelle Sprachverarbeitung, Stuttgart Univ., Germany
Volume :
3
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
1636
Abstract :
We describe the way in which a recording of Goethe´s “Die Leiden des jungen Werther” published on a multimedia CD-ROM (J.W. Goethe, 1995) was made accessible for prosody research. The recording is interesting for prosody research because of its prosodic richness as it displays a large variety of registers and speaking styles. Application areas are: development of prosody models for German TTS, unsupervised learning of pitch accent types, corpus search for research on prosody semantics and prosody syntax interaction, and the study of more global prosodic parameters (speaking rate, pitch range) defining registers or speaking style. The four hour recording was segmented into phonemes, syllables and words using HMM speech recognition techniques (S. Rapp, 1995), together with a large pronunciation lexicon (R.H. Baayen et al., 1993). A part of speech tagger (H. Schmid, 1995) was applied to the corpus to yield time aligned POS tags. The German adaptation of the tone sequence model of intonation used in Stuttgart (J. Mayer, 1995; C. Fery, 1993) inspired the parametrization of fundamental frequency. An intermediate phonetic representation layer is described that uses the syllable alignment to parametrize the F0 contour into a superposition of three functions: a hyperbolic tangent, a Gaussian and a constant
Keywords :
hidden Markov models; multimedia computing; natural languages; speech processing; speech recognition; German TTS; German adaptation; HMM speech recognition techniques; corpus search; global prosodic parameters; intermediate phonetic representation layer; intonation; large pronunciation lexicon; multimedia CD-ROM; part of speech tagger; phonemes; pitch accent types; prosody research; prosody semantics; prosody syntax interaction; speaking style; speaking styles; syllable alignment; time aligned POS tags; tone sequence model; unsupervised learning; words; CD recording; CD-ROMs; Displays; Hidden Markov models; Natural languages; Read only memory; Speech recognition; Stress; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607938
Filename :
607938
Link To Document :
بازگشت