• DocumentCode
    2369488
  • Title

    Privacy-preserving distributed clustering using generative models

  • Author

    Merugu, Srujana ; Ghosh, Joydeep

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    211
  • Lastpage
    218
  • Abstract
    We present a framework for clustering distributed data in unsupervised and semisupervised scenarios, taking into account privacy requirements and communication costs. Rather than sharing parts of the original or perturbed data, we instead transmit the parameters of suitable generative models built at each local data site to a central location. We mathematically show that the best representative of all the data is a certain "mean" model, and empirically show that this model can be approximated quite well by generating artificial samples from the underlying distributions using Markov Chain Monte Carlo techniques, and then fitting a combined global model with a chosen parametric form to these samples. We also propose a new measure that quantifies privacy based on information theoretic concepts, and show that decreasing privacy leads to a higher quality of the combined model and vice versa. We provide empirical results on different data types to highlight the generality of our framework. The results show that high quality distributed clustering can be achieved with little privacy loss and low communication cost.
  • Keywords
    Markov processes; Monte Carlo methods; data mining; data privacy; distributed databases; statistical analysis; Markov Chain; Monte Carlo techniques; communication cost; data privacy; distributed clustering; generative model; local data site; perturbed data; semisupervised scenarios; unsupervised scenarios; Clustering algorithms; Costs; Data mining; Data privacy; Distributed databases; Distributed power generation; Fitting; Law; Mathematical model; Monte Carlo methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1250922
  • Filename
    1250922