DocumentCode :
3145005
Title :
Consensus spectral clustering in near-linear time
Author :
Luo, Dijun ; Ding, Chris ; Huang, Heng ; Nie, Feiping
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Texas at Arlington, Arlington, TX, USA
fYear :
2011
fDate :
11-16 April 2011
Firstpage :
1079
Lastpage :
1090
Abstract :
This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n2) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.
Keywords :
computational complexity; data analysis; matrix decomposition; pattern clustering; K-means accumulative consensus matrix; clustering capability; computational complexity; consensus spectral clustering; convergent property; data clustering problems; data management; data points; large-scale data analysis; near-linear time; nonnegative matrix factorization; similarity matrix; spectral analysis techniques; Clustering algorithms; Computational complexity; Laplace equations; Manifolds; Matrix decomposition; Sparse matrices; Spectral analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
ISSN :
1063-6382
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
Type :
conf
DOI :
10.1109/ICDE.2011.5767925
Filename :
5767925
Link To Document :
بازگشت