DocumentCode :
1415343
Title :
Low-Rank Kernel Matrix Factorization for Large-Scale Evolutionary Clustering
Author :
Wang, Lijun ; Rege, Manjeet ; Dong, Ming ; Ding, Yongsheng
Author_Institution :
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI, USA
Volume :
24
Issue :
6
fYear :
2012
fDate :
6/1/2012 12:00:00 AM
Firstpage :
1036
Lastpage :
1050
Abstract :
Traditional clustering techniques are inapplicable to problems where the relationships between data points evolve over time. Not only is it important for the clustering algorithm to adapt to the recent changes in the evolving data, but it also needs to take the historical relationship between the data points into consideration. In this paper, we propose ECKF, a general framework for evolutionary clustering large-scale data based on low-rank kernel matrix factorization. To the best of our knowledge, this is the first work that clusters large evolutionary data sets by the amalgamation of low-rank matrix approximation methods and matrix factorization-based clustering. Since the low-rank approximation provides a compact representation of the original matrix, and especially, the near-optimal low-rank approximation can preserve the sparsity of the original data, ECKF gains computational efficiency and hence is applicable to large evolutionary data sets. Moreover, matrix factorization-based methods have been shown to effectively cluster high-dimensional data in text mining and multimedia data analysis. From a theoretical standpoint, we mathematically prove the convergence and correctness of ECKF, and provide detailed analysis of its computational efficiency (both time and space). Through extensive experiments performed on synthetic and real data sets, we show that ECKF outperforms the existing methods in evolutionary clustering.
Keywords :
approximation theory; computational complexity; data analysis; data mining; evolutionary computation; matrix decomposition; pattern clustering; ECKF; amalgamation; computational efficiency; data points; evolutionary clustering large-scale data; low-rank kernel matrix factorization; low-rank matrix approximation methods; multimedia data analysis; near-optimal low-rank approximation; real data sets; synthetic data sets; text mining; Accuracy; Approximation algorithms; Approximation methods; Clustering algorithms; Integrated circuits; Kernel; Matrix decomposition; Clustering; low-rank matrix approximation; matrix decomposition.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.258
Filename :
5677519
Link To Document :
بازگشت