• DocumentCode
    3064886
  • Title

    Hybrid Bisect K-Means Clustering Algorithm

  • Author

    Murugesan, Keerthiram ; Zhang, Jun

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Kentucky, Lexington, KY, USA
  • fYear
    2011
  • fDate
    29-31 July 2011
  • Firstpage
    216
  • Lastpage
    219
  • Abstract
    In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) for agglomerative clustering algorithm. First, we cluster the document collection using bisect K-means clustering algorithm with the value K´, which is greater than the total number of clusters, K. Second, we calculate the centroids of K´ clusters obtained from the previous step. Then we apply the UPGMA agglomerative hierarchical algorithm on these centroids for the given value, K. After the UPGMA finds K clusters in these K´ centroids, if two centroids ended up in the same cluster, then all of their documents will belong to the same cluster. We compared the goodness of clusters generated by bisect K-means and the proposed hybrid algorithms, measured on various cluster evaluation metrics. Our experimental results shows that the proposed method outperforms the standard bisect K-means algorithm.
  • Keywords
    document handling; pattern clustering; UPGMA agglomerative hierarchical algorithm; agglomerative hierarchical clustering algorithm; arithmetic mean; document clustering; k-means algorithm; unweighted pair group method; Clustering algorithms; Complexity theory; Computer science; Entropy; Hybrid power systems; Measurement; Partitioning algorithms; Bisect K-means; document clustering; hybrid algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Business Computing and Global Informatization (BCGIN), 2011 International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4577-0788-9
  • Electronic_ISBN
    978-0-7695-4464-9
  • Type

    conf

  • DOI
    10.1109/BCGIn.2011.62
  • Filename
    6003884