Title :
Improvements in the Partitions Selection Strategy for Set of Clustering Solutions
Author :
Sakata, Tiemi C. ; Faceli, Katti ; de Souto, Marcilio C. P. ; de Carvalho, Andre C. P. L. F.
Author_Institution :
Univ. Fed. de Sao Carlos - Campus Sorocaba, Sorocaba, Brazil
Abstract :
No clustering algorithm is guaranteed to find actual groups in any dataset. Thus, the selection of the most suitable clustering algorithm to be applied to a given dataset is not easy. To deal with this problem, one can apply various clustering algorithms to the dataset, generating a set of partitions (solutions). Next, one can choose the best partition generated, according to a given validation measure - such measures are usually biased towards one or more clustering algorithms. However, in many cases, it is interesting to have more than one solution. In a previous work, we proposed a selection strategy able to reduce the number of solutions obtained from Pareto-based multi-objective genetic algorithms. This selection strategy uses the correct Rand index to select a subset of the most different partitions. The size of the solutions´ set is controlled by a threshold of the value of this index, given as an external parameter. The reduction of the threshold value decreases the number of solutions. Since the choice of such a threshold value is not intuitive, this paper describes a modification of the original selection algorithm that automatically adjusts this threshold and guarantees the selection of the most evident partitions, which was simultaneously obtained with distinct clustering criteria. The new version does not require any user settings, presents a better number of solutions and maintains the diversity of the partitions in the reduced set.
Keywords :
data analysis; genetic algorithms; pattern clustering; Pareto based multiobjective genetic algorithm; Rand index; clustering algorithm; data analysis; threshold value; Clustering algorithms; Computational efficiency; Electronic mail; Glass; Indexes; Iris; Partitioning algorithms; cluster analysis; model selection;
Conference_Titel :
Neural Networks (SBRN), 2010 Eleventh Brazilian Symposium on
Conference_Location :
Sao Paulo
Print_ISBN :
978-1-4244-8391-4
Electronic_ISBN :
1522-4899
DOI :
10.1109/SBRN.2010.17