• DocumentCode
    498943
  • Title

    Improved blog clustering through automated weighting of text blocks

  • Author

    Li, Hongbo ; Ye, Yunming ; Huang, Joshu Zhexue

  • Author_Institution
    Shenzhen Grad. Sch., Harbin Inst. of Technol., Harbin, China
  • Volume
    3
  • fYear
    2009
  • fDate
    12-15 July 2009
  • Firstpage
    1586
  • Lastpage
    1591
  • Abstract
    In this paper, a new clustering algorithm is proposed for blog data clustering. Considering the structure information of text blocks in blog data, we group the features of blog data into three groups and extend the k-means clustering algorithm to automatically calculate a weight for each feature group in the clustering process. We introduce a new objective function with group weight variables and present the Lagrangian method to derive the formula to calculate the group weights. This formula is added as a new step in the standard k-means iterative clustering process to automatically compute the group weights according to the distribution of features. This new process guarantees the convergency of the clustering process to a local optimal solution. The experimental results have shown that this new algorithm performed better than k-means without group feature weighting on different blog data sets.
  • Keywords
    Web sites; data mining; iterative methods; pattern clustering; blog data clustering; blog data features; clustering algorithm; group weight variables; k-means clustering algorithm; k-means iterative clustering process; objective function; text blocks automated weighting; text blocks structure information; Cybernetics; Information services; Internet; Machine learning; Web sites; Blog; auto-weighted; clustering; web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2009 International Conference on
  • Conference_Location
    Baoding
  • Print_ISBN
    978-1-4244-3702-3
  • Electronic_ISBN
    978-1-4244-3703-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2009.5212352
  • Filename
    5212352