DocumentCode
3495720
Title
Distributed Clustering for Data Sources with Diverse Schema
Author
Visalakshi, N. Karthikeyani ; Thangavel, K. ; Alagambigai, P.
Author_Institution
Dept. of Comput. Sci., Vellalar Coll. For Women
Volume
1
fYear
2008
fDate
11-13 Nov. 2008
Firstpage
1058
Lastpage
1063
Abstract
Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some diseases, one may wish to integrate data gathered from many different hospitals. Analyzing and mining these distributed heterogeneous data sources require distributed machine learning and data mining technique In this paper, a Modified Distributed Combining Algorithm is proposed to cluster disparate data sources having diverse, possibly overlapping set of features and also need not share objects. First, all objects located at local sites are grouped using K-Means/Fuzzy C-Means clustering algorithm and resulting centroid is considered as local models. Then, the set of centroids are transformed into unified structure and optimum values are assigned to missing attributes. Finally, global cluster centroid is computed to identify global cluster model based on cluster ensemble and centroid mapping. The experiments are carried out for various datasets of UCI machine learning data repository in order to achieve the efficiency of the proposed algorithm.
Keywords
data mining; distributed processing; fuzzy set theory; learning (artificial intelligence); pattern clustering; automated diagnostic tool; centroid mapping; cluster ensemble; data mining technique; distributed data sources clustering; distributed heterogeneous data sources; distributed machine learning; diverse schema; fuzzy c-means clustering algorithm; k-means clustering algorithm; learning task; modified distributed combining algorithm; Clustering algorithms; Computer science; Couplings; Data mining; Distributed decision making; Machine learning; Machine learning algorithms; Partitioning algorithms; Robust stability; Unsupervised learning; Distributed Clustering; Diverse Schema; Global Centroid; K-Means; Local Centroid;
fLanguage
English
Publisher
ieee
Conference_Titel
Convergence and Hybrid Information Technology, 2008. ICCIT '08. Third International Conference on
Conference_Location
Busan
Print_ISBN
978-0-7695-3407-7
Type
conf
DOI
10.1109/ICCIT.2008.282
Filename
4682173
Link To Document