Title :
New approach for distributed clustering
Author :
Ghanem, Souhila ; Kechadi, Tahar ; Tari, A. Kamel
Author_Institution :
Dept. of Comput. Sci., Univ. of Bejaia, Bejaia, Algeria
fDate :
June 29 2011-July 1 2011
Abstract :
Nowadays the data collections are huge and in most cases do not reside in a centralised location. The latter complicates the task of traditional data mining techniques, as datasets are distributed and often heterogeneous. In this paper we propose a distributed approach based on the aggregation of models produced locally. The datasets will be processed locally on each node to produce clusters from local data then, we construct global clusters hierarchically. The aim of this approach is to minimise the communications, maximise the parallelism and load balance the work among different nodes of the system, and reduce the overhead due to extra processing while executing the hierarchical clustering. This technique is evaluated and compared to the sequential version using benchmark datasets and the results are very promising.
Keywords :
data analysis; data mining; pattern clustering; resource allocation; benchmark datasets; centralised location; data collections; data mining techniques; distributed clustering; global clusters; hierarchical clustering; load balance; Clustering algorithms; Data mining; Distributed databases; Indexes; Niobium; Optics; Partitioning algorithms; Clustering; Data Mining; Distributed Data Mining; OPTICS;
Conference_Titel :
Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2011 IEEE International Conference on
Conference_Location :
Fuzhou
Print_ISBN :
978-1-4244-8352-5
DOI :
10.1109/ICSDM.2011.5969005