DocumentCode :
3036347
Title :
Gradual clustering algorithms
Author :
Wu, Fei ; Gardarin, Georges
Author_Institution :
PRiSM Lab., Versailles Univ., Versailles, France
fYear :
2001
fDate :
21-21 April 2001
Firstpage :
48
Lastpage :
55
Abstract :
Clustering is one of the important techniques in data mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different clusters. The similarity between two objects is defined by a distance function, e.g., the Euclidean distance, which satisfies the triangular inequality. Distance calculation is computationally very expensive and many algorithms have been proposed so far to solve this problem. This paper considers the gradual clustering problem. From practice, we noticed that the user often begins clustering on a small number of attributes, e.g., two. If the result is partially satisfying the user will continue clustering on a higher number of attributes, e.g., ten. We refer to this problem as the gradual clustering problem. In fact gradual clustering can be considered as vertically incremental clustering. Approaches are proposed to solve this problem. The main idea is to reduce the number of distance calculations by using the triangle inequality. Our method first stores in an index the distances between a representative object and objects in n-dimensional space. Then these pre-computed distances are used to avoid distance calculations in (n+m)-dimensional space. Two experiments on real data sets demonstrate the added value of our approaches. The implemented algorithms are based on the DBSCAN algorithm with an associated M-Tree as index tree. However the principles of our idea can well be integrated with other tree structures such as MVP-Tree, R*-Tree, etc., and with other clustering algorithms.
Keywords :
data mining; pattern clustering; tree data structures; very large databases; DBSCAN algorithm; Euclidean distance; M-Tree; MVP-Tree; R*-Tree; data mining; data sets; distance calculations; distance function; experiments; gradual clustering algorithms; index tree; triangular inequality; vertically incremental clustering; Clustering algorithms; Data mining; Databases; Euclidean distance; Laboratories; Partitioning algorithms; Robustness; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Systems for Advanced Applications, 2001. Proceedings. Seventh International Conference on
Conference_Location :
Hong Kong, China
Print_ISBN :
0-7695-0996-7
Type :
conf
DOI :
10.1109/DASFAA.2001.916364
Filename :
916364
Link To Document :
بازگشت