• DocumentCode
    3090201
  • Title

    Semi-supervised clustering using multiobjective optimization

  • Author

    Saha, Simanto ; Ekbal, Asif ; Alok, Abhay Kumar

  • Author_Institution
    Comput. Sci. Eng., Indian Inst. of Technol. Patna, Patna, India
  • fYear
    2012
  • fDate
    4-7 Dec. 2012
  • Firstpage
    360
  • Lastpage
    365
  • Abstract
    Semi-supervised clustering uses the information of unsupervised and supervised learning to overcome the problems associated with them. Extracted information are given in the form of class labels and data distribution during clustering process. In this paper the problem of semi-supervised clustering is formulated under the framework of multiobjective optimization (MOO). Thereafter, a multiobjective based clustering technique is extended to solve the semi-supervised clustering problem. The newly developed semi-supervised multiobjective clustering algorithm (Semi-GenClustMOO), is used for appropriate partitioning of data into appropriate number of clusters. Four objective functions are optimized, out of which first three use some unsupervised information and the last one uses supervised information. These four objective functions represent, respectively, the, total compactness of the partitioning, total symmetry present in the clusters, cluster connectedness and Adjust Rand Index. These four objective functions are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method. Results show that it can easily detect the appropriate number of clusters as well as the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Seven artificial and four real-life data sets have been used for evaluation to show the effectiveness of the Semi-GenClustMOO technique. In each case class information of 10% randomly chosen data point is known to us 1.
  • Keywords
    data handling; learning (artificial intelligence); optimisation; pattern clustering; Adjust Rand Index; MOO; Semi-GenClustMOO; cluster connectedness; clustering process; data distribution; extracted information; multiobjective optimization; semisupervised multiobjective clustering algorithm; supervised information; supervised learning; unsupervised information; Clustering algorithms; Distributed databases; Euclidean distance; Indexes; Linear programming; Optimization; Partitioning algorithms; AMOSA; Adjusted Rand Index (ARI); Cluster validity index; Con-index; I-Index; Multiobjective optimization; Semi-supervised clustering; Sym-index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
  • Conference_Location
    Pune
  • Print_ISBN
    978-1-4673-5114-0
  • Type

    conf

  • DOI
    10.1109/HIS.2012.6421361
  • Filename
    6421361