• DocumentCode
    2334277
  • Title

    Efficient yet accurate clustering

  • Author

    Dash, Manoranjan ; Tan, Kian Lee ; Liu, Huan

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    99
  • Lastpage
    106
  • Abstract
    The authors show that most hierarchical agglomerative clustering (HAC) algorithms follow a 90-10 rule where roughly 90% iterations from the beginning merge cluster pairs with dissimilarity less than 10% of the maximum dissimilarity. We propose two algorithms: 2-phase and nested, based on partially overlapping partitioning (POP). To handle high-dimensional data efficiently, we propose a tree structure particularly suitable for POP. Extensive experiments show that the proposed algorithms reduce the time and memory requirement of existing HAC algorithms significantly without compromising accuracy
  • Keywords
    data analysis; pattern clustering; tree data structures; very large databases; 90-10 rule; HAC algorithms; POP; cluster pair merging; efficient accurate clustering; hierarchical agglomerative clustering algorithms; high-dimensional data; maximum dissimilarity; memory requirement; partially overlapping partitioning; tree structure; Clustering algorithms; Computational efficiency; Data mining; Iterative algorithms; Labeling; Partitioning algorithms; Robustness; Sampling methods; Tree data structures; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989506
  • Filename
    989506