Title :
Count Sort for GPU Computing
Author :
Sun, Weidong ; Ma, Zongmin
Author_Institution :
Sch. of Comput., Shenyang Inst. of Aeronaut. Eng., Shenyang, China
Abstract :
Counting sort is a simple, stable and efficient sort algorithm with linear running time, which is a fundamental building block for many applications. This paper depicts the design issues of a data parallel implementation of the count sort algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA Corporation. The full parallel version runs much faster than any serial implementation on CPU with the loss of stability due to the limitation of the massive threads parallel model. But the thread-level parallel implementation still provides an efficient parallel sort primitive for many applications, which do not require stable sort or can be adapted for unstable subroutines.
Keywords :
multiprocessing programs; parallel processing; software architecture; sorting; CUDA platform; GPU computing; commodity multiprocessor GPU; compute unified device architecture; counting sort; data parallel implementation; linear running time; sort algorithm; Application software; Bioinformatics; Computer architecture; Computer graphics; Concurrent computing; Hardware; Parallel processing; Parallel programming; Programming profession; Yarn; CUDA; Count sort; GPGPU; Parallel programming;
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4244-5788-5
DOI :
10.1109/ICPADS.2009.30