Title :
Ensemble based distributed soft clustering
Author :
Visalakshi, N. Karthikeyani ; Thangavel, K.
Author_Institution :
Dept. of Comput. Sci., Vellalar Coll. for Women, Erode
Abstract :
Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches for distributed knowledge discovery and data mining. The distributed clustering algorithm is used to cluster the distributed datasets without necessarily downloading all the data to a single site. Many applications can benefit from soft clustering, where each object is assigned to multiple clusters with membership weight that sum to one. In this paper, a novel distributed soft clustering algorithm based on ensemble learning is proposed by modifying the existing distributed K-Means algorithm, to attain high quality soft clusters. The proposed algorithm is able to cluster multiple homogeneous data sources, distributed over several local sites by combining local clustering results. The fuzzy C-Means algorithm is used to cluster local datasets and the centroids of individual datasets form an ensemble. The global centroid is obtained by clustering local centroids using K-Means algorithm with global K value at central place. The local soft clusters are updated using global centroid. The experiments are carried out for various datasets of UCI machine learning data repository to compare the performance the proposed algorithm with conventional centralized fuzzy C-Means clustering algorithm.
Keywords :
data mining; fuzzy set theory; pattern clustering; unsupervised learning; autonomous data source; data mining; distributed dataset; distributed k-mean algorithm; distributed knowledge discovery; distributed soft clustering; ensemble learning; fuzzy c-mean algorithm; global centroid; multiple homogeneous data source; unsupervised learning method; Clustering algorithms; Computer aided manufacturing; Computer science; Data mining; Educational institutions; Image segmentation; Machine learning algorithms; Partitioning algorithms; Robust stability; Unsupervised learning; Distributed Clustering; Fuzzy C-Means; Global Centroid; K-Means; Local Centroid;
Conference_Titel :
Computing, Communication and Networking, 2008. ICCCn 2008. International Conference on
Conference_Location :
St. Thomas, VI
Print_ISBN :
978-1-4244-3594-4
Electronic_ISBN :
978-1-4244-3595-1
DOI :
10.1109/ICCCNET.2008.4787679