DocumentCode :
3249348
Title :
O-Cluster: scalable clustering of large high dimensional data sets
Author :
Milenova, Boriana L. ; Campos, Marcos M.
Author_Institution :
Oracle Data Min. Technol., Burlington, MA, USA
fYear :
2002
fDate :
2002
Firstpage :
290
Lastpage :
297
Abstract :
Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster\´s excellent scalability.
Keywords :
computational complexity; data mining; pattern clustering; very large databases; O-Cluster; active sampling technique; axis-parallel partitioning strategy; complexity; data handling; data mining; large high dimensional data sets; limited memory buffer; multidimensional data sets; scalability; scalable clustering; Clustering algorithms; Computational complexity; Data mining; Information retrieval; Multidimensional systems; Multimedia databases; Partitioning algorithms; Sampling methods; Scalability; Shape;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183915
Filename :
1183915
Link To Document :
بازگشت