DocumentCode :
146475
Title :
Short text clustering using numerical data based on n-gram
Author :
Kumar, Ravindra ; Mathur, Robin Prakash
Author_Institution :
Dept. of Comput. Sci. Eng., Lovely Prof. Univ., Phagwara, India
fYear :
2014
fDate :
25-26 Sept. 2014
Firstpage :
274
Lastpage :
276
Abstract :
Short text messages, especially mobile SMSs contain not only pure textual strings but also contain numeric values. Existing systems discard and filter out these numeric values. In our research, a new approach has been developed which makes usage of numeric values for feature extraction in the process of clustering. We are proposing an algorithm that uses n-gram approach to retrieve the pre-strings and post-strings of each numeric data and then similarity between documents is calculated. Partitioning is done to separate out two types of documents such as pure textual as well as mixed documents. Text messaging is gaining popularity in the field of pushing and providing short indication and informative notifications to users at any time. Use of numerical values through n-gram plays an important role for efficient clustering of text messages.
Keywords :
data mining; electronic messaging; text analysis; document partitioning; document similarity; feature extraction; mixed documents; mobile SMS; n-gram approach; numeric values; numerical data; poststring retrieval; prestring retrieval; pure-textual documents; pushing field; short-text message clustering; textual strings; Clustering algorithms; Data mining; Electronic mail; Feature extraction; Java; Mobile communication; Vectors; Clustering; N-gram; VSM;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference -
Conference_Location :
Noida
Print_ISBN :
978-1-4799-4237-4
Type :
conf
DOI :
10.1109/CONFLUENCE.2014.6949257
Filename :
6949257
Link To Document :
بازگشت