DocumentCode :
3090201
Title :
Semi-supervised clustering using multiobjective optimization
Author :
Saha, Simanto ; Ekbal, Asif ; Alok, Abhay Kumar
Author_Institution :
Comput. Sci. Eng., Indian Inst. of Technol. Patna, Patna, India
fYear :
2012
fDate :
4-7 Dec. 2012
Firstpage :
360
Lastpage :
365
Abstract :
Semi-supervised clustering uses the information of unsupervised and supervised learning to overcome the problems associated with them. Extracted information are given in the form of class labels and data distribution during clustering process. In this paper the problem of semi-supervised clustering is formulated under the framework of multiobjective optimization (MOO). Thereafter, a multiobjective based clustering technique is extended to solve the semi-supervised clustering problem. The newly developed semi-supervised multiobjective clustering algorithm (Semi-GenClustMOO), is used for appropriate partitioning of data into appropriate number of clusters. Four objective functions are optimized, out of which first three use some unsupervised information and the last one uses supervised information. These four objective functions represent, respectively, the, total compactness of the partitioning, total symmetry present in the clusters, cluster connectedness and Adjust Rand Index. These four objective functions are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method. Results show that it can easily detect the appropriate number of clusters as well as the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Seven artificial and four real-life data sets have been used for evaluation to show the effectiveness of the Semi-GenClustMOO technique. In each case class information of 10% randomly chosen data point is known to us 1.
Keywords :
data handling; learning (artificial intelligence); optimisation; pattern clustering; Adjust Rand Index; MOO; Semi-GenClustMOO; cluster connectedness; clustering process; data distribution; extracted information; multiobjective optimization; semisupervised multiobjective clustering algorithm; supervised information; supervised learning; unsupervised information; Clustering algorithms; Distributed databases; Euclidean distance; Indexes; Linear programming; Optimization; Partitioning algorithms; AMOSA; Adjusted Rand Index (ARI); Cluster validity index; Con-index; I-Index; Multiobjective optimization; Semi-supervised clustering; Sym-index;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-5114-0
Type :
conf
DOI :
10.1109/HIS.2012.6421361
Filename :
6421361
Link To Document :
بازگشت