Title :
Feature Selection with Efficient Initialization of Clusters Centers for High Dimensional Data Clustering
Author :
Rajput, Dharmveer Singh ; Singh, P.K. ; Bhattacharya, M.
Author_Institution :
ABV - Indian Inst. of Inf. Technol. & Manage., Gwalior, India
Abstract :
Most of the traditional data clustering algorithms suffer from two main problems (i) the curse of dimensionality and (ii) random initialization of clusters centers which leads to local optimum clustering. In this paper, we propose a technique for selecting most relevant dimensions of data set and efficient initialization of clusters centers. Our proposed technique uses the median absolute deviation (MAD) for selection of relevant dimensions of data set and then it uses most frequent value (MODE) of selected dimensions to determine the initial clusters centers. Finally these initial clusters centers are used in the k-means algorithm for optimum clustering. Empirical results show that the algorithm produces comparatively efficient results. The quality measures also validate good quality of the obtained results.
Keywords :
data handling; pattern clustering; MAD; clusters centers; data set; efficient initialization; feature selection; high dimensional data clustering; median absolute deviation; optimum clustering; random initialization; Algorithm design and analysis; Breast cancer; Classification algorithms; Clustering algorithms; Dairy products; Feature extraction; Indexes; Clustering; feature selection; high dimensional data; initial centers; k-means;
Conference_Titel :
Communication Systems and Network Technologies (CSNT), 2011 International Conference on
Conference_Location :
Katra, Jammu
Print_ISBN :
978-1-4577-0543-4
Electronic_ISBN :
978-0-7695-4437-3
DOI :
10.1109/CSNT.2011.70