DocumentCode :
498943
Title :
Improved blog clustering through automated weighting of text blocks
Author :
Li, Hongbo ; Ye, Yunming ; Huang, Joshu Zhexue
Author_Institution :
Shenzhen Grad. Sch., Harbin Inst. of Technol., Harbin, China
Volume :
3
fYear :
2009
fDate :
12-15 July 2009
Firstpage :
1586
Lastpage :
1591
Abstract :
In this paper, a new clustering algorithm is proposed for blog data clustering. Considering the structure information of text blocks in blog data, we group the features of blog data into three groups and extend the k-means clustering algorithm to automatically calculate a weight for each feature group in the clustering process. We introduce a new objective function with group weight variables and present the Lagrangian method to derive the formula to calculate the group weights. This formula is added as a new step in the standard k-means iterative clustering process to automatically compute the group weights according to the distribution of features. This new process guarantees the convergency of the clustering process to a local optimal solution. The experimental results have shown that this new algorithm performed better than k-means without group feature weighting on different blog data sets.
Keywords :
Web sites; data mining; iterative methods; pattern clustering; blog data clustering; blog data features; clustering algorithm; group weight variables; k-means clustering algorithm; k-means iterative clustering process; objective function; text blocks automated weighting; text blocks structure information; Cybernetics; Information services; Internet; Machine learning; Web sites; Blog; auto-weighted; clustering; web mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
Type :
conf
DOI :
10.1109/ICMLC.2009.5212352
Filename :
5212352
Link To Document :
بازگشت