DocumentCode :
3310746
Title :
Clustering high dimensional massive scientific datasets
Author :
Otoo, Ekow J. ; Shoshani, Arie ; Hwang, Seung-Won
Author_Institution :
Lawrence Berkeley Nat. Lab., California Univ., Berkeley, CA, USA
fYear :
2001
fDate :
2001
Firstpage :
147
Lastpage :
157
Abstract :
Many scientific applications can benefit from an efficient clustering algorithm of massively large high dimensional datasets. However most of the developed algorithms are impractical to use when the amount of data is very large. Given N objects each defined by an M-dimensional feature vector any clustering technique for handling very large datasets in high dimensional space should run in time O(N) at best, and O(N log N) in the worst case, using no more than O(NM) storage, for it to be practical. A parallelized version of the same algorithm should achieve a linear speed-up in processing time with increasing number of processors. We introduce a hybrid algorithm called HyCeltyc, as an approach for clustering massively large high dimensional datasets. HyCeltyc, which stands for Hybrid Cell Density Clustering method combines a cell-density based algorithm with a hierarchical agglomerative method to identify clusters in linear time. The main steps of the algorithm involve sampling, dimensionality reduction and selection of significant features on which to cluster the data
Keywords :
query processing; scientific information systems; very large databases; HyCeltyc; Hybrid Cell Density Clustering method; data clustering; dimensionality reduction; feature vector; hierarchical agglomerative method; high dimensional massive scientific datasets; large databases; parallel algorithm; sampling; scientific applications; Application software; Clustering algorithms; Clustering methods; Cyclotrons; Ear; Multidimensional systems; Query processing; Samarium; Sampling methods; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings. Thirteenth International Conference on
Conference_Location :
Fairfax, VA
ISSN :
1099-3371
Print_ISBN :
0-7695-1218-6
Type :
conf
DOI :
10.1109/SSDM.2001.938547
Filename :
938547
Link To Document :
بازگشت