• DocumentCode
    737949
  • Title

    Embracing Information Explosion without Choking: Clustering and Labeling in Microblogging

  • Author

    Hu, Xia ; Tang, Lei ; Liu, Huan

  • Author_Institution
    Department of Computer Science and Engineering, Texas A&M University, College Station, TX
  • Volume
    1
  • Issue
    1
  • fYear
    2015
  • Firstpage
    35
  • Lastpage
    46
  • Abstract
    The explosive popularity of microblogging services produce a large volume of microblogging messages. It presents great difficulties for a user to quickly gauge his/her followees’ opinions when the user interface is overwhelmed by a large number of messages. Useful information is buried in disorganized, incomplete, and unstructured text messages. We propose to organize the large amount of messages into clusters with meaningful cluster labels, thus provide an overview of the content to fulfill users’ information needs. Clustering and labeling of microblogging messages are challenging because that the length of the messages are much shorter than conventional text documents. They usually cannot provide sufficient term co-occurrence information for capturing their semantic associations. As a result, traditional text representation models tend to yield unsatisfactory performance. In this paper, we present a text representation framework by harnessing the power of semantic knowledge bases, i.e., Wikipedia and Wordnet. The originally uncorrelated texts are connected with the semantic representation, thus it enhances the performance of short text clustering and labeling. The experimental results on Twitter and Facebook datasets demonstrate the superior performance of our framework in handling noisy and short microblogging messages.
  • Keywords
    Electronic publishing; Encyclopedias; Internet; Labeling; Semantics; Syntactics; Clustering; Labeling; Microblogging; Semantic Knowledge; clustering; labeling; semantic knowledge;
  • fLanguage
    English
  • Journal_Title
    Big Data, IEEE Transactions on
  • Publisher
    ieee
  • Type

    jour

  • DOI
    10.1109/TBDATA.2015.2451635
  • Filename
    7153539