DocumentCode :
692922
Title :
Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes
Author :
Welton, Benjamin ; Samanas, Evan ; Miller, Barton P.
Author_Institution :
Comput. Sci. Dept., Univ. of Wisconsin, Madison, WI, USA
fYear :
2013
fDate :
17-22 Nov. 2013
Firstpage :
1
Lastpage :
11
Abstract :
Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most wellknown density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines the MRNet tree-based distribution network with GPGPU-equipped nodes. Mr. Scan avoids the problems of existing implementations by effectively partitioning the point space and by optimizing DBSCAN´s computation over dense data regions. We tested Mr. Scan on both a geolocated Twitter dataset and image data obtained from the Sloan Digital Sky Survey. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 17.3 minutes. All other parallel DBSCAN implementations have only demonstrated the ability to cluster up to 100 million points.
Keywords :
data mining; graphics processing units; parallel algorithms; pattern clustering; social networking (online); trees (mathematics); Cray Titan; DBSCAN; GPGPU-equipped nodes; MRNet tree-based distribution network; Sloan digital sky survey; data mining techniques; density-based clustering algorithms; extreme scale density-based clustering; geolocated Twitter dataset; hybrid parallel implementation; irregularly shaped clusters; mr scan; tree-based network; Algorithm design and analysis; Clustering algorithms; Distributed databases; Noise; Optimization; Partitioning algorithms; Spatial indexes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2013 International Conference for
Conference_Location :
Denver, CO
Print_ISBN :
978-1-4503-2378-9
Type :
conf
DOI :
10.1145/2503210.2503262
Filename :
6877517
Link To Document :
بازگشت