DocumentCode :
2544650
Title :
Discovering Communities with Self-Adaptive k Clustering in Microblog Data
Author :
Ting Huang ; Dunlu Peng ; Lidong Cao
Author_Institution :
Sch. of Opt.-Electr. & Comput. Eng., Univ. of Shanghai for Sci. & Technol., Shanghai, China
fYear :
2012
fDate :
1-3 Nov. 2012
Firstpage :
383
Lastpage :
390
Abstract :
Nowadays, microblogging has been a popular social network service whose population has incredibly increased in past few years. Many business companies regard microblogging service as an indispensable medium to directly obtain timely opinions from customers and potential customers. A community in social network refers to a crowd of people having similar interests or paying their attention on same things. User community recognition in microblogging social network service is very important for identifying hot topics or users´ interests which are very helpful for companies to improve their marketing strategies. However, the massive non-structural tweet data brings tremendous challenge for efficiently mining the valuable communities hidden in it. Tweet data is characterized as containing massive information, being involved in large fields, short-length and non-structure. This makes tweets quite different from the conventional text documents. In order to analyze the data more effectively, in this paper, we propose a set of techniques to preprocess tweets, such as word identification, categories matching and data standardization. An unsupervised learning method has been presented to automatically cluster microblog users into different communities. In the method, an optimized CLARANS algorithm has been developed according to the characteristics of microblog data. During the process of clustering, the interactive relationship between tweets is also exploited to improve the clustering quality. In addition, a self-adaptive k strategy is employed to make the proposed approach more applicable. In order to investigate the performance of our approach from different aspects, we conducted a series of experiments with the microblog data collected from SINA Weibo.
Keywords :
data mining; pattern clustering; pattern matching; social networking (online); unsupervised learning; CLARANS algorithm; Sina Weibo; category matching; clustering quality; community discovery; data mining; data standardization; microblog data; microblogging; self-adaptive k-clustering; social network service; tweet data; tweet preprocessing; unsupervised learning method; user community recognition; word identification; Algorithm design and analysis; Clustering algorithms; Communities; Data mining; Market research; Probability; Social network services; adaptive k; clustering; community recognition; microblogging; social network;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud and Green Computing (CGC), 2012 Second International Conference on
Conference_Location :
Xiangtan
Print_ISBN :
978-1-4673-3027-5
Type :
conf
DOI :
10.1109/CGC.2012.92
Filename :
6382845
Link To Document :
بازگشت