• DocumentCode
    2777981
  • Title

    Ensemble based distributed soft clustering

  • Author

    Visalakshi, N. Karthikeyani ; Thangavel, K.

  • Author_Institution
    Dept. of Comput. Sci., Vellalar Coll. for Women, Erode
  • fYear
    2008
  • fDate
    18-20 Dec. 2008
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches for distributed knowledge discovery and data mining. The distributed clustering algorithm is used to cluster the distributed datasets without necessarily downloading all the data to a single site. Many applications can benefit from soft clustering, where each object is assigned to multiple clusters with membership weight that sum to one. In this paper, a novel distributed soft clustering algorithm based on ensemble learning is proposed by modifying the existing distributed K-Means algorithm, to attain high quality soft clusters. The proposed algorithm is able to cluster multiple homogeneous data sources, distributed over several local sites by combining local clustering results. The fuzzy C-Means algorithm is used to cluster local datasets and the centroids of individual datasets form an ensemble. The global centroid is obtained by clustering local centroids using K-Means algorithm with global K value at central place. The local soft clusters are updated using global centroid. The experiments are carried out for various datasets of UCI machine learning data repository to compare the performance the proposed algorithm with conventional centralized fuzzy C-Means clustering algorithm.
  • Keywords
    data mining; fuzzy set theory; pattern clustering; unsupervised learning; autonomous data source; data mining; distributed dataset; distributed k-mean algorithm; distributed knowledge discovery; distributed soft clustering; ensemble learning; fuzzy c-mean algorithm; global centroid; multiple homogeneous data source; unsupervised learning method; Clustering algorithms; Computer aided manufacturing; Computer science; Data mining; Educational institutions; Image segmentation; Machine learning algorithms; Partitioning algorithms; Robust stability; Unsupervised learning; Distributed Clustering; Fuzzy C-Means; Global Centroid; K-Means; Local Centroid;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing, Communication and Networking, 2008. ICCCn 2008. International Conference on
  • Conference_Location
    St. Thomas, VI
  • Print_ISBN
    978-1-4244-3594-4
  • Electronic_ISBN
    978-1-4244-3595-1
  • Type

    conf

  • DOI
    10.1109/ICCCNET.2008.4787679
  • Filename
    4787679