Title of article :
Approximate pairwise clustering for large data sets via sampling plus extension
Author/Authors :
Wang، نويسنده , , Liang and Leckie، نويسنده , , Christopher and Kotagiri، نويسنده , , Ramamohanarao and Bezdek، نويسنده , , James، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Pages :
14
From page :
222
To page :
235
Abstract :
Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a “sampling, clustering plus extension” strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method.
Keywords :
Out-of-sample extension , graph embedding , Pairwise data , Spectral clustering , Selective sampling
Journal title :
PATTERN RECOGNITION
Serial Year :
2011
Journal title :
PATTERN RECOGNITION
Record number :
1733895
Link To Document :
بازگشت