مرکز منطقه ای اطلاع رساني علوم و فناوري - Consensus spectral clustering in near-linear time

DocumentCode :

3145005

Title :

Consensus spectral clustering in near-linear time

Author :

Luo, Dijun ; Ding, Chris ; Huang, Heng ; Nie, Feiping

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of Texas at Arlington, Arlington, TX, USA

fYear :

2011

fDate :

11-16 April 2011

Firstpage :

1079

Lastpage :

1090

Abstract :

This paper addresses the scalability issue in spectral analysis which has been widely used in data management applications. Spectral analysis techniques enjoy powerful clustering capability while suffer from high computational complexity. In most of previous research, the bottleneck of computational complexity of spectral analysis stems from the construction of pairwise similarity matrix among objects, which costs at least O(n²) where n is the number of the data points. In this paper, we propose a novel estimator of the similarity matrix using K-means accumulative consensus matrix which is intrinsically sparse. The computational cost of the accumulative consensus matrix is O(nlogn). We further develop a Non-negative Matrix Factorization approach to derive clustering assignment. The overall complexity of our approach remains O(nlogn). In order to validate our method, we (1) theoretically show the local preserving and convergent property of the similarity estimator, (2) validate it by a large number of real world datasets and compare the results to other state-of-the-art spectral analysis, and (3) apply it to large-scale data clustering problems. Results show that our approach uses much less computational time than other state-of-the-art clustering methods, meanwhile provides comparable clustering qualities. We also successfully apply our approach to a 5-million dataset on a single machine using reasonable time. Our techniques open a new direction for high-quality large-scale data analysis.

Keywords :

computational complexity; data analysis; matrix decomposition; pattern clustering; K-means accumulative consensus matrix; clustering capability; computational complexity; consensus spectral clustering; convergent property; data clustering problems; data management; data points; large-scale data analysis; near-linear time; nonnegative matrix factorization; similarity matrix; spectral analysis techniques; Clustering algorithms; Computational complexity; Laplace equations; Manifolds; Matrix decomposition; Sparse matrices; Spectral analysis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Engineering (ICDE), 2011 IEEE 27th International Conference on

Conference_Location :

Hannover

ISSN :

1063-6382

Print_ISBN :

978-1-4244-8959-6

Electronic_ISBN :

1063-6382

Type :

conf

DOI :

10.1109/ICDE.2011.5767925

Filename :

5767925

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3145005