• DocumentCode
    2544650
  • Title

    Discovering Communities with Self-Adaptive k Clustering in Microblog Data

  • Author

    Ting Huang ; Dunlu Peng ; Lidong Cao

  • Author_Institution
    Sch. of Opt.-Electr. & Comput. Eng., Univ. of Shanghai for Sci. & Technol., Shanghai, China
  • fYear
    2012
  • fDate
    1-3 Nov. 2012
  • Firstpage
    383
  • Lastpage
    390
  • Abstract
    Nowadays, microblogging has been a popular social network service whose population has incredibly increased in past few years. Many business companies regard microblogging service as an indispensable medium to directly obtain timely opinions from customers and potential customers. A community in social network refers to a crowd of people having similar interests or paying their attention on same things. User community recognition in microblogging social network service is very important for identifying hot topics or users´ interests which are very helpful for companies to improve their marketing strategies. However, the massive non-structural tweet data brings tremendous challenge for efficiently mining the valuable communities hidden in it. Tweet data is characterized as containing massive information, being involved in large fields, short-length and non-structure. This makes tweets quite different from the conventional text documents. In order to analyze the data more effectively, in this paper, we propose a set of techniques to preprocess tweets, such as word identification, categories matching and data standardization. An unsupervised learning method has been presented to automatically cluster microblog users into different communities. In the method, an optimized CLARANS algorithm has been developed according to the characteristics of microblog data. During the process of clustering, the interactive relationship between tweets is also exploited to improve the clustering quality. In addition, a self-adaptive k strategy is employed to make the proposed approach more applicable. In order to investigate the performance of our approach from different aspects, we conducted a series of experiments with the microblog data collected from SINA Weibo.
  • Keywords
    data mining; pattern clustering; pattern matching; social networking (online); unsupervised learning; CLARANS algorithm; Sina Weibo; category matching; clustering quality; community discovery; data mining; data standardization; microblog data; microblogging; self-adaptive k-clustering; social network service; tweet data; tweet preprocessing; unsupervised learning method; user community recognition; word identification; Algorithm design and analysis; Clustering algorithms; Communities; Data mining; Market research; Probability; Social network services; adaptive k; clustering; community recognition; microblogging; social network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Green Computing (CGC), 2012 Second International Conference on
  • Conference_Location
    Xiangtan
  • Print_ISBN
    978-1-4673-3027-5
  • Type

    conf

  • DOI
    10.1109/CGC.2012.92
  • Filename
    6382845