DocumentCode
3717343
Title
Multi-probe random projection clustering to secure very large distributed datasets
Author
Lee A. Carraher;Philip A. Wilsey;Anindya Moitra;Sayantan Dey
Author_Institution
University of Cincinnati, Cincinnati, OH 45221-0030
fYear
2015
Firstpage
1891
Lastpage
1900
Abstract
This paper presents a solution to the approximate k-means clustering problem for very large distributed datasets. Distributed data models have gained popularity in recent years following the efforts of commercial, academic and government organizations, to make data more widely accessible. Due to the sheer volume of available data, in-memory single-core computation quickly becomes infeasible, requiring distributed multiprocessing. Our solution achieves comparable clustering performance to other popular clustering algorithms, with improved overall complexity growth while being amenable to distributed processing frameworks such as Map-Reduce. Our solution also maintains certain guarantees regarding data privacy deanonimization.
Keywords
"Clustering algorithms","Lattices","Approximation algorithms","Distributed databases","Algorithm design and analysis","Partitioning algorithms","Complexity theory"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7363964
Filename
7363964
Link To Document