DocumentCode :
3089604
Title :
Simultaneous feature selection and clustering for categorical features using multi objective genetic algorithm
Author :
Dutta, D. ; Dutta, Pranab ; Sil, J.
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Univ. Inst. of Technol., Burdwan, India
fYear :
2012
fDate :
4-7 Dec. 2012
Firstpage :
191
Lastpage :
196
Abstract :
Clustering is unsupervised learning where ideally class levels and number of clusters (K) are not known. K-clustering can be categorized as semi-supervised learning where K is known. Here we have considered K-Clustering with simultaneous feature selection. Feature subset selection helps to identify relevant features for clustering, increase understandability, better scalability and improve accuracy. Here we have used two measures, intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) for clustering. Measures are using mod distance per feature suitable for categorical features (attributes). Rather than combining H and S to frame the problem as single objective optimization problem, we use multi objective genetic algorithm (MOGA) to find out diverse solutions near to Pareto optimal front in the two-dimensional objective space. Each evolved solution represents a set of cluster modes (CMs) build by selected feature subset. Here, K-modes is hybridized with MOGA. We have used hybridized GA to combine global searching powers of GA with local searching powers of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using MOGA. Results on 3 benchmark data sets from UCI Machine Learning Repository containing categorical features shows the superiority of the algorithm.
Keywords :
genetic algorithms; pattern clustering; unsupervised learning; MOGA; categorical features; class levels; feature identification; feature subset selection; intra-cluster distance; k-clustering; mod distance per feature; multiobjective genetic algorithm; objective optimization problem; semisupervised learning; simultaneous feature clustering; simultaneous feature selection; unsupervised learning; Biological cells; Buildings; Clustering algorithms; Equations; Mathematical model; Sociology; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-5114-0
Type :
conf
DOI :
10.1109/HIS.2012.6421332
Filename :
6421332
Link To Document :
بازگشت