Title :
A k-Means-Based Projected Clustering Algorithm
Author :
Sun, Yufen ; Liu, Gang ; Xu, Kun
Author_Institution :
Coll. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
Abstract :
In high dimensional data space, clusters are likely to exist in different subspaces. K-means is a classic clustering algorithm, but it cannot be used to find subspace clusters. In this paper, an algorithm called GKM is designed to generalize k-means algorithm for high dimensional data. In the objective function of GKM, we associate a weight vector with each cluster to indicate which dimensions are relevant to this cluster. To prevent the value of the objective function from decreasing because of the elimination of dimensions, virtual dimensions are added to the objective function. The values of data points on virtual dimensions are set artificially to ensure that the objective function is minimized when the real subspace clusters or the clusters in original space are found. Algorithm GKM preserves the advantages of k-means. It can identify subspace clusters with linear time complexity. Our performance study with a synthetic dataset and a real dataset demonstrates the efficiency and effectiveness of GKM.
Keywords :
data mining; pattern clustering; generalize k-means algorithm; k-means clustering; objective function; projected clustering algorithm; virtual dimensions; weight vector; Algorithm design and analysis; Clustering algorithms; Data mining; Educational institutions; Intelligent transportation systems; Iterative algorithms; Machine learning algorithms; Partitioning algorithms; Space technology; Sun; data mining; high dimensions; k-means; projected clustering;
Conference_Titel :
Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
Conference_Location :
Huangshan, Anhui
Print_ISBN :
978-1-4244-6812-6
Electronic_ISBN :
978-1-4244-6813-3
DOI :
10.1109/CSO.2010.119