DocumentCode :
3705827
Title :
Microtext normalization using probably-phonetically-similar word discovery
Author :
Richard Khoury
Author_Institution :
Department of Software Engineering, Lakehead University, Thunder Bay, Canada
fYear :
2015
Firstpage :
384
Lastpage :
391
Abstract :
Microtext normalization is the challenge of discovering the English words corresponding to the unusually-spelled words used in social-media messages and posts. In this paper, we propose a novel method for doing this by rendering both English and microtext words phonetically based on their spelling, and matching similar ones together. We present our algorithm to learn spelling-to-phonetic probabilities and to efficiently search the English language and match words together. Our results demonstrate that our system correctly handles many types of normalization problems.
Keywords :
"Dictionaries","Encyclopedias","Electronic publishing","Internet","Conferences"
Publisher :
ieee
Conference_Titel :
Wireless and Mobile Computing, Networking and Communications (WiMob), 2015 IEEE 11th International Conference on
Type :
conf
DOI :
10.1109/WiMOB.2015.7347988
Filename :
7347988
Link To Document :
بازگشت