DocumentCode :
2791608
Title :
Normalization of text messages for text-to-speech
Author :
Pennell, Deana L. ; Liu, Yang
Author_Institution :
Comput. Sci. Dept., Univ. of Texas at Dallas, Richardson, TX, USA
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
4842
Lastpage :
4845
Abstract :
This paper describes a normalization system for text messages to allow them to be read by a TTS engine. To address the large number of texting abbreviations, we use a statistical classifier to learn when to delete a character. The features we use are based on character context, function, and position in the word and containing syllable. To ensure that our system is robust to different abbreviations for a word, we generate multiple abbreviation hypotheses for each word based on the classifier´s prediction. We then reverse the mappings to enable prediction of English words from the abbreviations. Our results show that this approach is feasible and warrants further exploration.
Keywords :
speech synthesis; English words; character context; character function; character position; multiple abbreviation hypotheses; normalization system; statistical classifier; text messages; text-to-speech engine; Cellular phones; Computer science; Engines; Hidden Markov models; Machine learning; Natural languages; Robustness; Safety; Speech synthesis; Supervised learning; abbreviation; text messages; text normalization; text-to-speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5495127
Filename :
5495127
Link To Document :
بازگشت