DocumentCode
3249348
Title
O-Cluster: scalable clustering of large high dimensional data sets
Author
Milenova, Boriana L. ; Campos, Marcos M.
Author_Institution
Oracle Data Min. Technol., Burlington, MA, USA
fYear
2002
fDate
2002
Firstpage
290
Lastpage
297
Abstract
Clustering large data sets of high dimensionality has always been a challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with a very large number of records and/or with a very high number of dimensions. We provide a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the "curse of dimensionality" and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. O-Cluster combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustness to noise, and O-Cluster\´s excellent scalability.
Keywords
computational complexity; data mining; pattern clustering; very large databases; O-Cluster; active sampling technique; axis-parallel partitioning strategy; complexity; data handling; data mining; large high dimensional data sets; limited memory buffer; multidimensional data sets; scalability; scalable clustering; Clustering algorithms; Computational complexity; Data mining; Information retrieval; Multidimensional systems; Multimedia databases; Partitioning algorithms; Sampling methods; Scalability; Shape;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN
0-7695-1754-4
Type
conf
DOI
10.1109/ICDM.2002.1183915
Filename
1183915
Link To Document