A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets

Author

Foss, Andrew ; Zaïane, Osmar R.

Author_Institution

Alberta Univ., Edmonton, Alta., Canada

fYear

2002

fDate

2002

Firstpage

179

Lastpage

186

Abstract

Clustering is the problem of grouping data based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The problem Of clustering data sets is also known as unsupervised classification, since no class labels are given. However, all existing clustering algorithms require some parameters to steer the clustering process, such as the famous k for the number of expected clusters, which constitutes a supervision of a sort. We present in this paper a new, efficient, fast and scalable clustering algorithm that clusters over a range of resolutions and finds a potential optimum clustering without requiring any parameter input. Our experiments show that our algorithm outperforms most existing clustering algorithms in quality and speed for large data sets.

Keywords

data mining; minimisation; pattern clustering; arbitrarily shaped cluster discovery; clustering; efficient clustering algorithm; fast clustering algorithm; inter-group similarity minimization; intra-group similarity maximization; large datasets; parameterless method; scalable clustering algorithm; unsupervised classification; Clustering algorithms; Clustering methods; Gravity; Multi-stage noise shaping; Noise shaping; Partitioning algorithms; Scalability; Shape;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on

Print_ISBN

0-7695-1754-4

Type

conf

DOI

10.1109/ICDM.2002.1183901

Filename

1183901