مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

312284

Title :

Goethe for prosody

Author :

Rapp, Stefan

Author_Institution :

Inst. fur Maschinelle Sprachverarbeitung, Stuttgart Univ., Germany

Volume :

fYear :

1996

fDate :

3-6 Oct 1996

Firstpage :

1636

Abstract :

We describe the way in which a recording of Goethe´s “Die Leiden des jungen Werther” published on a multimedia CD-ROM (J.W. Goethe, 1995) was made accessible for prosody research. The recording is interesting for prosody research because of its prosodic richness as it displays a large variety of registers and speaking styles. Application areas are: development of prosody models for German TTS, unsupervised learning of pitch accent types, corpus search for research on prosody semantics and prosody syntax interaction, and the study of more global prosodic parameters (speaking rate, pitch range) defining registers or speaking style. The four hour recording was segmented into phonemes, syllables and words using HMM speech recognition techniques (S. Rapp, 1995), together with a large pronunciation lexicon (R.H. Baayen et al., 1993). A part of speech tagger (H. Schmid, 1995) was applied to the corpus to yield time aligned POS tags. The German adaptation of the tone sequence model of intonation used in Stuttgart (J. Mayer, 1995; C. Fery, 1993) inspired the parametrization of fundamental frequency. An intermediate phonetic representation layer is described that uses the syllable alignment to parametrize the F₀ contour into a superposition of three functions: a hyperbolic tangent, a Gaussian and a constant

Keywords :

hidden Markov models; multimedia computing; natural languages; speech processing; speech recognition; German TTS; German adaptation; HMM speech recognition techniques; corpus search; global prosodic parameters; intermediate phonetic representation layer; intonation; large pronunciation lexicon; multimedia CD-ROM; part of speech tagger; phonemes; pitch accent types; prosody research; prosody semantics; prosody syntax interaction; speaking style; speaking styles; syllable alignment; time aligned POS tags; tone sequence model; unsupervised learning; words; CD recording; CD-ROMs; Displays; Hidden Markov models; Natural languages; Read only memory; Speech recognition; Stress; Unsupervised learning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location :

Philadelphia, PA

Print_ISBN :

0-7803-3555-4

Type :

conf

DOI :

10.1109/ICSLP.1996.607938

Filename :

607938

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=312284