DocumentCode
498943
Title
Improved blog clustering through automated weighting of text blocks
Author
Li, Hongbo ; Ye, Yunming ; Huang, Joshu Zhexue
Author_Institution
Shenzhen Grad. Sch., Harbin Inst. of Technol., Harbin, China
Volume
3
fYear
2009
fDate
12-15 July 2009
Firstpage
1586
Lastpage
1591
Abstract
In this paper, a new clustering algorithm is proposed for blog data clustering. Considering the structure information of text blocks in blog data, we group the features of blog data into three groups and extend the k-means clustering algorithm to automatically calculate a weight for each feature group in the clustering process. We introduce a new objective function with group weight variables and present the Lagrangian method to derive the formula to calculate the group weights. This formula is added as a new step in the standard k-means iterative clustering process to automatically compute the group weights according to the distribution of features. This new process guarantees the convergency of the clustering process to a local optimal solution. The experimental results have shown that this new algorithm performed better than k-means without group feature weighting on different blog data sets.
Keywords
Web sites; data mining; iterative methods; pattern clustering; blog data clustering; blog data features; clustering algorithm; group weight variables; k-means clustering algorithm; k-means iterative clustering process; objective function; text blocks automated weighting; text blocks structure information; Cybernetics; Information services; Internet; Machine learning; Web sites; Blog; auto-weighted; clustering; web mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location
Baoding
Print_ISBN
978-1-4244-3702-3
Electronic_ISBN
978-1-4244-3703-0
Type
conf
DOI
10.1109/ICMLC.2009.5212352
Filename
5212352
Link To Document