Author_Institution :
Department of Intelligent Systems, "Jožef Stefan" Institute, Jamova 39, SI-1000 Ljubljana, Slovenia
Abstract :
The Slovenian text-to-speech engine is a modular system consisting of four independent modules (text normalization, grapheme-to-phoneme conversion, prosody generation and segmental concatenation), which are pipelined together. Each module is responsible for one portion of the problem of converting from text into speech. The first two modules comprises such tasks as end-of-sentence detection, abbreviation and number expansion, special formats conversion, morphological and contextual analysis, phonological modeling. In order to generate rules for our synthesis scheme, data was collected by analysing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modelling and so-called superpositional approach at pitch modelling. The system is based on the concatenation of speech units, diphones and some frequently used polyphones, using TD-PSOLA technique.