DocumentCode :
1937447
Title :
Identifying accents in Italian text: a preprocessing step in TTS
Author :
Jing, Hongyan
Author_Institution :
Lucent Technol. Bell Labs., Murray Hill, NJ, USA
fYear :
2002
fDate :
11-13 Sept. 2002
Firstpage :
151
Lastpage :
154
Abstract :
Diacritic marks are often missing in informal communications such as emails; even in well-formatted corpora, diacritic marks are not consistently present. A text-to-speech synthesis system needs a preprocessor to restore the missing accents in order to produce correct word pronunciation. We present an algorithm for automatically identifying accents in Italian text. We consider accent identification as a classification problem and use supervised learning to automatically induce classification rules for disambiguating accents. The overall accuracy is 99.6 % when tested on over 2000 ambiguous words in a 420 MB corpus. For the most ambiguous words, the program achieved 91.4 % accuracy, comparing to the 71.3 % baseline. This accent identification system can serve as a preprocessor for a TTS system, invoked only when the input text contains words that are accent ambiguous.
Keywords :
learning (artificial intelligence); pattern classification; speech processing; speech synthesis; word processing; Italian text; TTS; accent disambiguation; accent identification; classification rules; correct word pronunciation; diacritic marks; preprocessing step; supervised learning; text-to-speech synthesis; Data preprocessing; Electronic mail; Engines; Humans; Speech synthesis; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
Type :
conf
DOI :
10.1109/WSS.2002.1224396
Filename :
1224396
Link To Document :
بازگشت