Title :
Efficient yet accurate clustering
Author :
Dash, Manoranjan ; Tan, Kian Lee ; Liu, Huan
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
Abstract :
The authors show that most hierarchical agglomerative clustering (HAC) algorithms follow a 90-10 rule where roughly 90% iterations from the beginning merge cluster pairs with dissimilarity less than 10% of the maximum dissimilarity. We propose two algorithms: 2-phase and nested, based on partially overlapping partitioning (POP). To handle high-dimensional data efficiently, we propose a tree structure particularly suitable for POP. Extensive experiments show that the proposed algorithms reduce the time and memory requirement of existing HAC algorithms significantly without compromising accuracy
Keywords :
data analysis; pattern clustering; tree data structures; very large databases; 90-10 rule; HAC algorithms; POP; cluster pair merging; efficient accurate clustering; hierarchical agglomerative clustering algorithms; high-dimensional data; maximum dissimilarity; memory requirement; partially overlapping partitioning; tree structure; Clustering algorithms; Computational efficiency; Data mining; Iterative algorithms; Labeling; Partitioning algorithms; Robustness; Sampling methods; Tree data structures; World Wide Web;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989506