• DocumentCode
    3495720
  • Title

    Distributed Clustering for Data Sources with Diverse Schema

  • Author

    Visalakshi, N. Karthikeyani ; Thangavel, K. ; Alagambigai, P.

  • Author_Institution
    Dept. of Comput. Sci., Vellalar Coll. For Women
  • Volume
    1
  • fYear
    2008
  • fDate
    11-13 Nov. 2008
  • Firstpage
    1058
  • Lastpage
    1063
  • Abstract
    Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some diseases, one may wish to integrate data gathered from many different hospitals. Analyzing and mining these distributed heterogeneous data sources require distributed machine learning and data mining technique In this paper, a Modified Distributed Combining Algorithm is proposed to cluster disparate data sources having diverse, possibly overlapping set of features and also need not share objects. First, all objects located at local sites are grouped using K-Means/Fuzzy C-Means clustering algorithm and resulting centroid is considered as local models. Then, the set of centroids are transformed into unified structure and optimum values are assigned to missing attributes. Finally, global cluster centroid is computed to identify global cluster model based on cluster ensemble and centroid mapping. The experiments are carried out for various datasets of UCI machine learning data repository in order to achieve the efficiency of the proposed algorithm.
  • Keywords
    data mining; distributed processing; fuzzy set theory; learning (artificial intelligence); pattern clustering; automated diagnostic tool; centroid mapping; cluster ensemble; data mining technique; distributed data sources clustering; distributed heterogeneous data sources; distributed machine learning; diverse schema; fuzzy c-means clustering algorithm; k-means clustering algorithm; learning task; modified distributed combining algorithm; Clustering algorithms; Computer science; Couplings; Data mining; Distributed decision making; Machine learning; Machine learning algorithms; Partitioning algorithms; Robust stability; Unsupervised learning; Distributed Clustering; Diverse Schema; Global Centroid; K-Means; Local Centroid;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Convergence and Hybrid Information Technology, 2008. ICCIT '08. Third International Conference on
  • Conference_Location
    Busan
  • Print_ISBN
    978-0-7695-3407-7
  • Type

    conf

  • DOI
    10.1109/ICCIT.2008.282
  • Filename
    4682173