Identifying accents in Italian text: a preprocessing step in TTS

Author

Jing, Hongyan

Author_Institution

Lucent Technol. Bell Labs., Murray Hill, NJ, USA

fYear

2002

fDate

11-13 Sept. 2002

Firstpage

151

Lastpage

154

Abstract

Diacritic marks are often missing in informal communications such as emails; even in well-formatted corpora, diacritic marks are not consistently present. A text-to-speech synthesis system needs a preprocessor to restore the missing accents in order to produce correct word pronunciation. We present an algorithm for automatically identifying accents in Italian text. We consider accent identification as a classification problem and use supervised learning to automatically induce classification rules for disambiguating accents. The overall accuracy is 99.6 % when tested on over 2000 ambiguous words in a 420 MB corpus. For the most ambiguous words, the program achieved 91.4 % accuracy, comparing to the 71.3 % baseline. This accent identification system can serve as a preprocessor for a TTS system, invoked only when the input text contains words that are accent ambiguous.

Keywords

learning (artificial intelligence); pattern classification; speech processing; speech synthesis; word processing; Italian text; TTS; accent disambiguation; accent identification; classification rules; correct word pronunciation; diacritic marks; preprocessing step; supervised learning; text-to-speech synthesis; Data preprocessing; Electronic mail; Engines; Humans; Speech synthesis; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on

Print_ISBN

0-7803-7395-2

Type

conf

DOI

10.1109/WSS.2002.1224396

Filename

1224396