مرکز منطقه ای اطلاع رساني علوم و فناوري - Pardicle: Parallel Approximate Density-Based Clustering

DocumentCode :

228716

Title :

Pardicle: Parallel Approximate Density-Based Clustering

Author :

Patwary, M. Mostofa Ali ; Satish, Nadathur ; Sundaram, Narayanan ; Manne, Fredrik ; Habib, Salman ; Dubey, Pradeep

fYear :

2014

fDate :

16-21 Nov. 2014

Firstpage :

560

Lastpage :

571

Abstract :

DBSCAN is a widely used is density-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, we propose a fast heuristic algorithm for DBSCAN using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Our experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that our approximate algorithm is up to 56× faster than exact algorithms with almost identical quality (Omega-Index ≥ 0.99). We develop a new parallel DBSCAN algorithm, which uses dynamic partitioning to improve load balancing and locality. We demonstrate near-linear speedup on shared memory (15× using 16 cores, single node Intel^® Xeon^® processor) and distributed memory (3917× using 4096 cores, multinode) computers, with 2× additional performance improvement using Intel^® Xeon Phi™ coprocessors. Additionally, existing exact algorithms can achieve up to 3.4 times speedup using dynamic partitioning.

Keywords :

approximation theory; computational complexity; distributed shared memory systems; pattern clustering; resource allocation; sampling methods; Intel Xeon Phi coprocessor; approximate algorithm; arbitrarily-shaped cluster; astrophysics; density based sampling; density-based clustering algorithm; distributed memory; dynamic partitioning; exact algorithm; heuristic algorithm; load balancing; load locality; multinode computer; near-linear speedup; noise data filter; parallel DBSCAN algorithm; parallel approximate density-based clustering; pardicle; particle data; performance improvement; shared memory; single node Intel Xeon processor; synthetic massive datasets; Approximation algorithms; Approximation methods; Clustering algorithms; Data structures; Heuristic algorithms; Instruction sets; Partitioning algorithms; Density based clustering; Disjoint-set data structure; Union-Find algorithm; approximate clustering algorithm;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for

Conference_Location :

New Orleans, LA

Print_ISBN :

978-1-4799-5499-5

Type :

conf

DOI :

10.1109/SC.2014.51

Filename :

7013033

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=228716