Clustering validity assessment: finding the optimal partitioning of a data set

Author

Halkidi, Maria ; Vazirgiannis, Michalis

fYear

2001

fDate

2001

Firstpage

187

Lastpage

194

Abstract

Clustering is a mostly unsupervised procedure and the majority of clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. As a consequence, in most applications the resulting clustering scheme requires some sort of evaluation regarding its validity. In this paper we present a clustering validity procedure, which evaluates the results of clustering algorithms on data sets. We define a validity index, S Dbw, based on well-defined clustering criteria enabling the selection of optimal input parameter values for a clustering algorithm that result in the best partitioning of a data set. We evaluate the reliability of our index both theoretically and experimentally, considering three representative clustering algorithms run on synthetic and real data sets. We also carried out an evaluation study to compare S Dbw performance with other known validity indices. Our approach performed favorably in all cases, even those in which other indices failed to indicate the correct partitions in a data set

Keywords

data mining; pattern clustering; SDbw validity index; clustering algorithms; clustering validity assessment; optimal partitioning data set; reliability; Clustering algorithms; Data visualization; Geometry; Humans; Informatics; Multidimensional systems; Partitioning algorithms; Radio access networks; Reliability theory; Visual perception;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on

Conference_Location

San Jose, CA

Print_ISBN

0-7695-1119-8

Type

conf

DOI

10.1109/ICDM.2001.989517

Filename

989517