• DocumentCode
    3717343
  • Title

    Multi-probe random projection clustering to secure very large distributed datasets

  • Author

    Lee A. Carraher;Philip A. Wilsey;Anindya Moitra;Sayantan Dey

  • Author_Institution
    University of Cincinnati, Cincinnati, OH 45221-0030
  • fYear
    2015
  • Firstpage
    1891
  • Lastpage
    1900
  • Abstract
    This paper presents a solution to the approximate k-means clustering problem for very large distributed datasets. Distributed data models have gained popularity in recent years following the efforts of commercial, academic and government organizations, to make data more widely accessible. Due to the sheer volume of available data, in-memory single-core computation quickly becomes infeasible, requiring distributed multiprocessing. Our solution achieves comparable clustering performance to other popular clustering algorithms, with improved overall complexity growth while being amenable to distributed processing frameworks such as Map-Reduce. Our solution also maintains certain guarantees regarding data privacy deanonimization.
  • Keywords
    "Clustering algorithms","Lattices","Approximation algorithms","Distributed databases","Algorithm design and analysis","Partitioning algorithms","Complexity theory"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363964
  • Filename
    7363964