Title :
K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps
Author :
Kohlhoff, K.J. ; Pande, V.S. ; Altman, R.B.
Author_Institution :
Dept. of Bioeng., Stanford Univ., Stanford, CA, USA
Abstract :
We present an implementation of parallel K-means clustering, called Kps-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.
Keywords :
graphics processing units; learning (artificial intelligence); parallel algorithms; pattern clustering; sorting; Kps-means; Nvidia GPU; all-prefix-sum sorting step; data preprocessing step; data updating step; graphics processing unit; high performance computing; numerical software package; parallel K-means clustering; parallel architecture; parallel sorting; parallelism degree; Arrays; Graphics processing unit; Instruction sets; Kernel; Memory management; Sorting; Vectors; Arrays; Clustering algorithms; Graphics processing unit; Instruction sets; Kernel; Memory management; Sorting; Vectors; biology and genetics; graphics processors; parallel algorithms;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2012.234