DocumentCode :
2179331
Title :
Toward text message normalization: Modeling abbreviation generation
Author :
Pennell, Deana ; Liu, Yang
Author_Institution :
Comput. Sci. Dept., Univ. of Texas at Dallas, Dallas, TX, USA
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
5364
Lastpage :
5367
Abstract :
This paper describes a text normalization system for deletion-based abbreviations in informal text. We propose using statistical classifiers to learn the probability of deleting a given character using features based on character context, position in the word and containing syllable, and function within the word. To ensure that our system is robust to different and previously unseen abbreviations for a word, we generate multiple abbreviation hypotheses for a word using the predictions from the classifiers. We then reverse the mappings to enable recovery of English words from the abbreviations. Different knowledge sources are used to disambiguate word candidates: abbreviation likelihood, length, and language model scores. Our results show that this approach is feasible and warrants further exploration in the future.
Keywords :
electronic messaging; probability; speech synthesis; text analysis; word processing; English word; SMS; abbreviation likelihood; character context; deletion-based abbreviation; probability; toward text message normalization; Computational modeling; Context; Decoding; Error analysis; Hidden Markov models; Mathematical model; Twitter; abbreviation modeling; noisy text processing; text normalization; twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947570
Filename :
5947570
Link To Document :
بازگشت