• DocumentCode
    1937447
  • Title

    Identifying accents in Italian text: a preprocessing step in TTS

  • Author

    Jing, Hongyan

  • Author_Institution
    Lucent Technol. Bell Labs., Murray Hill, NJ, USA
  • fYear
    2002
  • fDate
    11-13 Sept. 2002
  • Firstpage
    151
  • Lastpage
    154
  • Abstract
    Diacritic marks are often missing in informal communications such as emails; even in well-formatted corpora, diacritic marks are not consistently present. A text-to-speech synthesis system needs a preprocessor to restore the missing accents in order to produce correct word pronunciation. We present an algorithm for automatically identifying accents in Italian text. We consider accent identification as a classification problem and use supervised learning to automatically induce classification rules for disambiguating accents. The overall accuracy is 99.6 % when tested on over 2000 ambiguous words in a 420 MB corpus. For the most ambiguous words, the program achieved 91.4 % accuracy, comparing to the 71.3 % baseline. This accent identification system can serve as a preprocessor for a TTS system, invoked only when the input text contains words that are accent ambiguous.
  • Keywords
    learning (artificial intelligence); pattern classification; speech processing; speech synthesis; word processing; Italian text; TTS; accent disambiguation; accent identification; classification rules; correct word pronunciation; diacritic marks; preprocessing step; supervised learning; text-to-speech synthesis; Data preprocessing; Electronic mail; Engines; Humans; Speech synthesis; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
  • Print_ISBN
    0-7803-7395-2
  • Type

    conf

  • DOI
    10.1109/WSS.2002.1224396
  • Filename
    1224396