DocumentCode
737949
Title
Embracing Information Explosion without Choking: Clustering and Labeling in Microblogging
Author
Hu, Xia ; Tang, Lei ; Liu, Huan
Author_Institution
Department of Computer Science and Engineering, Texas A&M University, College Station, TX
Volume
1
Issue
1
fYear
2015
Firstpage
35
Lastpage
46
Abstract
The explosive popularity of microblogging services produce a large volume of microblogging messages. It presents great difficulties for a user to quickly gauge his/her followees’ opinions when the user interface is overwhelmed by a large number of messages. Useful information is buried in disorganized, incomplete, and unstructured text messages. We propose to organize the large amount of messages into clusters with meaningful cluster labels, thus provide an overview of the content to fulfill users’ information needs. Clustering and labeling of microblogging messages are challenging because that the length of the messages are much shorter than conventional text documents. They usually cannot provide sufficient term co-occurrence information for capturing their semantic associations. As a result, traditional text representation models tend to yield unsatisfactory performance. In this paper, we present a text representation framework by harnessing the power of semantic knowledge bases, i.e., Wikipedia and Wordnet. The originally uncorrelated texts are connected with the semantic representation, thus it enhances the performance of short text clustering and labeling. The experimental results on Twitter and Facebook datasets demonstrate the superior performance of our framework in handling noisy and short microblogging messages.
Keywords
Electronic publishing; Encyclopedias; Internet; Labeling; Semantics; Syntactics; Clustering; Labeling; Microblogging; Semantic Knowledge; clustering; labeling; semantic knowledge;
fLanguage
English
Journal_Title
Big Data, IEEE Transactions on
Publisher
ieee
Type
jour
DOI
10.1109/TBDATA.2015.2451635
Filename
7153539
Link To Document