DocumentCode :
1925218
Title :
Scalable Clustering with smoka
Author :
Kogan, Jacob
Author_Institution :
Dept. of Math. & Stat., UMBC, Baltimore, MD
fYear :
2007
fDate :
5-7 March 2007
Firstpage :
299
Lastpage :
303
Abstract :
The paper reports a multi-step clustering procedure equipped with a divergence (a distance like junction derived from a convex function). The first step of the procedure is a BIRCH like algorithm capable to convert very large datasets to "summaries" that require much less computer memory. The second step is the principal direction divisive partitioning algorithm (PDDP) that partitions the set of "summaries" into k clusters. This partition is the input for a smoothed k-means based clustering algorithm (smoka). The final partition of "summaries" generated by smoka induces a partition of the original dataset. Preliminary numerical experiments with text collections reported in the paper demonstrate smoka\´s remarkable accuracy and speed of convergence
Keywords :
pattern clustering; principal direction divisive partitioning algorithm; smoothed k-means based clustering algorithm; Algorithm design and analysis; Annealing; Clustering algorithms; Convergence of numerical methods; Costs; Iterative algorithms; Jacobian matrices; Mathematics; Partitioning algorithms; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing: Theory and Applications, 2007. ICCTA '07. International Conference on
Conference_Location :
Kolkata
Print_ISBN :
0-7695-2770-1
Type :
conf
DOI :
10.1109/ICCTA.2007.114
Filename :
4127385
Link To Document :
بازگشت