• DocumentCode
    968549
  • Title

    Approximate Distributed K-Means Clustering over a Peer-to-Peer Network

  • Author

    Datta, Souptik ; Giannella, Chris R. ; Kargupta, Hillol

  • Author_Institution
    Ecompex Inc., McLean, VA, USA
  • Volume
    21
  • Issue
    10
  • fYear
    2009
  • Firstpage
    1372
  • Lastpage
    1388
  • Abstract
    Data intensive peer-to-peer (P2P) networks are finding increasing number of applications. Data mining in such P2P environments is a natural extension. However, common monolithic data mining architectures do not fit well in such environments since they typically require centralizing the distributed data which is usually not practical in a large P2P network. Distributed data mining algorithms that avoid large-scale synchronization or data centralization offer an alternate choice. This paper considers the distributed K-means clustering problem where the data and computing resources are distributed over a large P2P network. It offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm. The first is designed to operate in a dynamic P2P network that can produce clusterings by ldquolocalrdquo synchronization only. The second algorithm uses uniformly sampled peers and provides analytical guarantees regarding the accuracy of clustering on a P2P network. Empirical results show that both the algorithms demonstrate good performance compared to their centralized counterparts at the modest communication cost.
  • Keywords
    data mining; pattern clustering; peer-to-peer computing; data centralization; data intensive peer-to-peer networks; data mining architectures; distributed k-means clustering; dynamic P2P network; Algorithm design and analysis; Approximation algorithms; Clustering algorithms; Computer networks; Costs; Data mining; Distributed computing; History; Large-scale systems; Peer to peer computing; Algorithm design and analysis; Algorithm/protocol design and analysis; Algorithms for data and knowledge management; Clustering; Data mining; Distributed programming; Distributed systems; Elementary function approximation; Knowledge management applications; Mining methods and algorithms; Peer-to-peer data mining; Ubiquitous computing; Web mining; distributed K-means clustering.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.222
  • Filename
    4663068