• DocumentCode
    2731367
  • Title

    Conquering the Divide: Continuous Clustering of Distributed Data Streams

  • Author

    Cormode, G. ; Muthukrishnan, S. ; Wei Zhuang

  • Author_Institution
    AT&T Labs. Res., Murray Hill, NJ, USA
  • fYear
    2007
  • fDate
    15-20 April 2007
  • Firstpage
    1036
  • Lastpage
    1045
  • Abstract
    Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we must perform distributed computations over the data, guaranteeing high quality answers even as new data arrives. In this paper, we formalize and study the problem of maintaining a clustering of such distributed data that is continuously evolving. In particular, our goal is to minimize the communication and computational cost, still providing guaranteed accuracy of the clustering. We focus on the k-center clustering, and provide a suite of algorithms that vary based on which centralized algorithm they derive from, and whether they maintain a single global clustering or many local clusterings that can be merged together. We show that these algorithms can be designed to give accuracy guarantees that are close to the best possible even in the centralized case. In our experiments, we see clear trends among these algorithms, showing that the choice of algorithm is crucial, and that we can achieve a clustering that is as good as the best centralized clustering, with only a small fraction of the communication required to collect all the data in a single location.
  • Keywords
    data handling; distributed data stream continuous clustering; distributed network; k-center clustering; Algorithm design and analysis; Clustering algorithms; Computational efficiency; Data acquisition; Distributed computing; Educational institutions; Electronic mail; High performance computing; Monitoring; Underwater tracking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    1-4244-0802-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2007.368962
  • Filename
    4221752