DocumentCode :
2076019
Title :
Seeding Cluster Centers of K-means Clustering through Median Projection
Author :
Suresh, Lalith ; Simha, Jay B. ; Velur, Rajappa
Author_Institution :
Dept. of CSE, Cambridge Inst. of Technol., Bangalore, India
fYear :
2010
fDate :
15-18 Feb. 2010
Firstpage :
217
Lastpage :
222
Abstract :
K-means Clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm. This algorithm uses predefined number of clusters as input. The original algorithm is based on random selection of cluster centers and iteratively improving the results. However there are two major limitations in this approach. First, the need for number of clusters in advance, is difficult since the underlying structure is not known. Second selection of cluster centers randomly in local optima. In addition most of the K-means implementations are memory based structures limiting the data size. In this work, a novel approach to seeding the clusters with the latent data structure is proposed. This is expected to minimize: The need for number of clusters apriory, thereby reducing time for convergence by providing near optimal cluster centers. In addition the implementation of the algorithm is done in SQL, to provide the disk based solution, to handle large data sets, which cannot fit into memory. The proposed solution was tested on both row store and column store databases. The results are promising and the work is under progress to test in different domains.
Keywords :
SQL; data structures; pattern clustering; K-means clustering; SQL; column store databases; disk based solution; latent data structure; median projection; random selection; row store databases; seeding cluster centers; Algorithm design and analysis; Clustering algorithms; Competitive intelligence; Convergence; Databases; Intelligent structures; Iterative algorithms; Partitioning algorithms; Software systems; Testing; Clustering; Median projection and Median Selection; Multidimensional data; SQL; prediction model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on
Conference_Location :
Krakow
Print_ISBN :
978-1-4244-5917-9
Type :
conf
DOI :
10.1109/CISIS.2010.133
Filename :
5447429
Link To Document :
بازگشت