DocumentCode :
3492241
Title :
On the clustering of large-scale data: A matrix-based approach
Author :
Wang, Lijun ; Dong, Ming
Author_Institution :
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI, USA
fYear :
2011
fDate :
July 31 2011-Aug. 5 2011
Firstpage :
139
Lastpage :
144
Abstract :
Nowadays, the analysis of large amounts of digital documents become a hot research topic since the libraries and database are converted electronically, such as PUBMED and IEEE publications. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based Low-rank sparse Matrix Decomposition (ELMD), to cluster large-scale datasets. Specifically, given a data matrix, ELMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. From a theoretical perspective, we show the correctness and convergence of the ELMD algorithm, and provide detailed analysis on its efficiency. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the superior performance of ELMD for clustering large-scale data.
Keywords :
approximation theory; data mining; data structures; document handling; matrix decomposition; pattern clustering; set theory; ubiquitous computing; ELMD algorithm; IEEE publication; PUBMED publication; cluster centroids; data matrix; data mining; data subspace; digital database; digital document; digital library; exemplar-based low rank sparse matrix decomposition; large scale data set clustering; matrix-based approach; near optimal low rank approximation; real dataset; sparse information; synthetic dataset; ubiquitous phenomenon; Accuracy; Approximation algorithms; Approximation methods; Clustering algorithms; Matrix decomposition; Noise; Sparse matrices;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), The 2011 International Joint Conference on
Conference_Location :
San Jose, CA
ISSN :
2161-4393
Print_ISBN :
978-1-4244-9635-8
Type :
conf
DOI :
10.1109/IJCNN.2011.6033212
Filename :
6033212
Link To Document :
بازگشت