DocumentCode :
2643004
Title :
Tweets mining using WIKIPEDIA and impurity cluster measurement
Author :
Chen, Qing ; Shipper, Timothy ; Khan, Latifur
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX, USA
fYear :
2010
fDate :
23-26 May 2010
Firstpage :
141
Lastpage :
143
Abstract :
Twitter is one of the fastest growing online social networking services. Tweets can be categorized into trends, and are related with tags and follower/following social relationships. The categorization is neither accurate nor effective due to the short length of tweet messages and noisy data corpus. In this paper, we attempt to overcome these challenges with an extended feature vector along with a semi-supervised clustering technique. In order to achieve this goal, the training set is expanded with Wikipedia topic search result, and the feature set is extended. When building the clustering model and doing the classification, impurity measurement is introduced into our classifier platform. Our experiment results show that the proposed techniques outperform other classifiers with reasonable precision and recall.
Keywords :
Clustering algorithms; Computer science; Euclidean distance; Impurities; Nearest neighbor searches; Neural networks; Partitioning algorithms; Social network services; Twitter; Wikipedia; extended features; tweet mining; wikipedia;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics (ISI), 2010 IEEE International Conference on
Conference_Location :
Vancouver, BC, Canada
Print_ISBN :
978-1-4244-6444-9
Type :
conf
DOI :
10.1109/ISI.2010.5484758
Filename :
5484758
Link To Document :
بازگشت