• DocumentCode
    3089604
  • Title

    Simultaneous feature selection and clustering for categorical features using multi objective genetic algorithm

  • Author

    Dutta, D. ; Dutta, Pranab ; Sil, J.

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Technol., Univ. Inst. of Technol., Burdwan, India
  • fYear
    2012
  • fDate
    4-7 Dec. 2012
  • Firstpage
    191
  • Lastpage
    196
  • Abstract
    Clustering is unsupervised learning where ideally class levels and number of clusters (K) are not known. K-clustering can be categorized as semi-supervised learning where K is known. Here we have considered K-Clustering with simultaneous feature selection. Feature subset selection helps to identify relevant features for clustering, increase understandability, better scalability and improve accuracy. Here we have used two measures, intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) for clustering. Measures are using mod distance per feature suitable for categorical features (attributes). Rather than combining H and S to frame the problem as single objective optimization problem, we use multi objective genetic algorithm (MOGA) to find out diverse solutions near to Pareto optimal front in the two-dimensional objective space. Each evolved solution represents a set of cluster modes (CMs) build by selected feature subset. Here, K-modes is hybridized with MOGA. We have used hybridized GA to combine global searching powers of GA with local searching powers of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using MOGA. Results on 3 benchmark data sets from UCI Machine Learning Repository containing categorical features shows the superiority of the algorithm.
  • Keywords
    genetic algorithms; pattern clustering; unsupervised learning; MOGA; categorical features; class levels; feature identification; feature subset selection; intra-cluster distance; k-clustering; mod distance per feature; multiobjective genetic algorithm; objective optimization problem; semisupervised learning; simultaneous feature clustering; simultaneous feature selection; unsupervised learning; Biological cells; Buildings; Clustering algorithms; Equations; Mathematical model; Sociology; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
  • Conference_Location
    Pune
  • Print_ISBN
    978-1-4673-5114-0
  • Type

    conf

  • DOI
    10.1109/HIS.2012.6421332
  • Filename
    6421332