Title of article
A distance based clustering method for arbitrary shaped clusters in large datasets
Author/Authors
Patra، نويسنده , , Bidyut Kr. and Nandi، نويسنده , , Sukumar and Viswanath، نويسنده , , P.، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2011
Pages
9
From page
2862
To page
2870
Abstract
Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O ( n 2 ) , where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.
Keywords
Distance based clustering , Single-link , Hybrid clustering method , Large datasets , Arbitrary shaped clusters , Leaders
Journal title
PATTERN RECOGNITION
Serial Year
2011
Journal title
PATTERN RECOGNITION
Record number
1734205
Link To Document