DocumentCode :
1624568
Title :
A Study on Outlier distance and SSE with multidimensional datasets in K-means clustering
Author :
Rajee, A.M. ; Francis, F. Sagayaraj
Author_Institution :
Dept. of CSE, Pondicherry Eng. Coll., Puducherry, India
fYear :
2013
Firstpage :
33
Lastpage :
36
Abstract :
Clustering is a very well-known technique in data mining. One of the most widely used clustering techniques is the K-means algorithm. It is very popular because it is conceptually simple, computationally fast and memory efficient. In this paper, the role of noise points in limiting the efficacy of k-means algorithm was presented, by analyzing them within the purview of sum-of-squared error (SSE), which continues to remain the undisputedly popular validation method of K-means algorithm. Experimental studies were made with synthetic data sets of multiple dimensions and cluster sizes. Numerous noise points were barraged to the K clusters and the effect of noise distance on SSE was considered. On analyzing the results, we infer that the distance of noise to the cluster center influences SSE. This correlative study holds much significance, as the k-means algorithm assumes that the number of clusters in the database is perceived in anticipation. Apparently, this is not necessarily true in real-world applications. The study probes the pathognomonic role of noise points in the clustering outcome, which in the process will serve to provide with better results in real-world applications.
Keywords :
data mining; pattern clustering; statistical analysis; SSE; clustering outcome; data mining; k-means clustering technique; multidimensional dataset; noise distance; outlier distance; pathognomonic role; sum-of-squared error; Noise; Three-dimensional displays; K-means; SSE; data clustering; multidimensional data sets; outliers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computing (ICoAC), 2013 Fifth International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4799-3447-8
Type :
conf
DOI :
10.1109/ICoAC.2013.6921923
Filename :
6921923
Link To Document :
بازگشت