DocumentCode :
2082564
Title :
Finding Clusters in subspaces of very large, multi-dimensional datasets
Author :
Cordeiro, Robson L F ; Traina, Agma J M ; Faloutsos, Christos ; Traina, Caetano, Jr.
Author_Institution :
Comput. Sci. Dept. - ICMC, Univ. of Sao Paulo, Sao Carlos, Brazil
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
625
Lastpage :
636
Abstract :
We propose the Multi-resolution Correlation Cluster detection (MrCC), a novel, scalable method to detect correlation clusters able to analyze dimensional data in the range of around 5 to 30 axes. Existing methods typically exhibit super-linear behavior in terms of space or execution time. MrCC employs a novel data structure based on multi-resolution and gains over previous approaches in: (a) it finds clusters that stand out in the data in a statistical sense; (b) it is linear on running time and memory usage regarding number of data points and dimensionality of subspaces where clusters exist; (c) it is linear in memory usage and quasi-linear in running time regarding space dimensionality; and (d) it is accurate, deterministic, robust to noise, does not require stating the number of clusters as input parameter, does not perform distance calculation and is able to detect clusters in subspaces generated by original axes or linear combinations of original axes, including space rotation. We performed experiments on synthetic data ranging from 5 to 30 axes and from 12 k to 250 k points, and MrCC outperformed in time five of the recent and related work, being in average 10 times faster than the competitors that also presented high accuracy results for every tested dataset. Regarding real data, MrCC found clusters at least 9 times faster than the competitors, increasing their accuracy in up to 34 percent.
Keywords :
data structures; pattern clustering; correlation clusters; data structure; multidimensional datasets; multiresolution correlation cluster detection; space dimensionality; Brazil Council; Clustering methods; Computer science; Data analysis; Data structures; Noise generators; Noise robustness; Performance evaluation; Performance gain; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447924
Filename :
5447924
Link To Document :
بازگشت