DocumentCode :
228716
Title :
Pardicle: Parallel Approximate Density-Based Clustering
Author :
Patwary, M. Mostofa Ali ; Satish, Nadathur ; Sundaram, Narayanan ; Manne, Fredrik ; Habib, Salman ; Dubey, Pradeep
fYear :
2014
fDate :
16-21 Nov. 2014
Firstpage :
560
Lastpage :
571
Abstract :
DBSCAN is a widely used is density-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for DBSCAN using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56× faster than exact algorithms with almost identical quality (Omega-Index ≥ 0.99). We develop a new parallel DBSCAN algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15× using 16 cores, single node Intel® Xeon® processor) and distributed memory (3917× using 4096 cores, multinode) computers, with 2× additional performance improvement using Intel® Xeon Phi™ coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.
Keywords :
approximation theory; computational complexity; distributed shared memory systems; pattern clustering; resource allocation; sampling methods; Intel Xeon Phi coprocessor; approximate algorithm; arbitrarily-shaped cluster; astrophysics; density based sampling; density-based clustering algorithm; distributed memory; dynamic partitioning; exact algorithm; heuristic algorithm; load balancing; load locality; multinode computer; near-linear speedup; noise data filter; parallel DBSCAN algorithm; parallel approximate density-based clustering; pardicle; particle data; performance improvement; shared memory; single node Intel Xeon processor; synthetic massive datasets; Approximation algorithms; Approximation methods; Clustering algorithms; Data structures; Heuristic algorithms; Instruction sets; Partitioning algorithms; Density based clustering; Disjoint-set data structure; Union-Find algorithm; approximate clustering algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4799-5499-5
Type :
conf
DOI :
10.1109/SC.2014.51
Filename :
7013033
Link To Document :
بازگشت