Title :
Scalable Clustering with smoka
Author_Institution :
Dept. of Math. & Stat., UMBC, Baltimore, MD
Abstract :
The paper reports a multi-step clustering procedure equipped with a divergence (a distance like junction derived from a convex function). The first step of the procedure is a BIRCH like algorithm capable to convert very large datasets to "summaries" that require much less computer memory. The second step is the principal direction divisive partitioning algorithm (PDDP) that partitions the set of "summaries" into k clusters. This partition is the input for a smoothed k-means based clustering algorithm (smoka). The final partition of "summaries" generated by smoka induces a partition of the original dataset. Preliminary numerical experiments with text collections reported in the paper demonstrate smoka\´s remarkable accuracy and speed of convergence
Keywords :
pattern clustering; principal direction divisive partitioning algorithm; smoothed k-means based clustering algorithm; Algorithm design and analysis; Annealing; Clustering algorithms; Convergence of numerical methods; Costs; Iterative algorithms; Jacobian matrices; Mathematics; Partitioning algorithms; Statistics;
Conference_Titel :
Computing: Theory and Applications, 2007. ICCTA '07. International Conference on
Conference_Location :
Kolkata
Print_ISBN :
0-7695-2770-1
DOI :
10.1109/ICCTA.2007.114