• DocumentCode
    2834483
  • Title

    An Improved Hierarchical K-Means Algorithm for Web Document Clustering

  • Author

    Liu, Yongxin ; Liu, Zhijng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
  • fYear
    2008
  • fDate
    Aug. 29 2008-Sept. 2 2008
  • Firstpage
    606
  • Lastpage
    610
  • Abstract
    In order to conquer the major challenges of current Web document clustering, i.e. huge volume of documents, high dimensional process, we proposed a simple agglomerative hierarchical k-means clustering (SAHKC) algorithm based on H-K (hierarchical k-means) algorithm, and a new model was used in this paper to describe the Web document, named as multiple feature vector space model (MFVSM). Experimental results indicate that: the MFVSM is helpful in improving the quality of clustering result, and compare with the H-K algorithm, the SAHKC algorithmpsilas running time reduce nearly 30%, however, the average precision of clustering result only reduce about 10%.
  • Keywords
    Internet; document handling; pattern clustering; Web document clustering; multiple feature vector space model; simple agglomerative hierarchical k-means clustering; Clustering algorithms; Clustering methods; Computer science; Data mining; Databases; Greedy algorithms; HTML; Information technology; Partitioning algorithms; Web mining; K-Means; vector space time (VSM); web document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology, 2008. ICCSIT '08. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-3308-7
  • Type

    conf

  • DOI
    10.1109/ICCSIT.2008.152
  • Filename
    4624939