مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimization of linked list prefix computations on multithreaded GPUs using CUDA

DocumentCode :

2441604

Title :

Optimization of linked list prefix computations on multithreaded GPUs using CUDA

Author :

Wei, Zheng ; Jaja, Joseph

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA

fYear :

2010

fDate :

19-23 April 2010

Firstpage :

Lastpage :

Abstract :

We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures involve in general highly irregular fine grain memory accesses that are typical of many computations on linked lists, trees, and graphs. While the current generation of GPUs provides substantial computational power and extremely high bandwidth memory accesses, they may appear at first to be primarily geared toward streamed, highly data parallel computations. In this paper, we introduce an optimized multithreaded GPU algorithm for prefix computations through a randomization process that reduces the problem to a large number of fine-grain computations. We map these fine-grain computations onto multithreaded GPUs in such a way that the processing cost per element is shown to be close to the best possible. Our experimental results show scalability for list sizes ranging from 1M nodes to 256M nodes, and significantly improve on the recently published parallel implementations of list ranking, including implementations on the Cell Processor, the MTA-8, and the NVIDIA GeForce 200 series. They also compare favorably to the performance of the best known CUDA algorithm for the scan operation on the Tesla C1060.

Keywords :

coprocessors; multi-threading; CUDA; MTA; NVIDIA GeForce 200 series; Tesla C1060; cell processor; data parallel computations; extremely high bandwidth memory accesses; fine grain memory accesses; linked list prefix computations; multithreaded GPUs; optimization; prefix sums; randomization process; Concurrent computing; Educational institutions; Graphics processing unit; Hardware; Multicore processing; Parallel algorithms; Parallel processing; Phase change random access memory; Scalability; Tree graphs; CUDA; GPU; Parallel Computing; Prefix Computation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Conference_Location :

Atlanta, GA

ISSN :

1530-2075

Print_ISBN :

978-1-4244-6442-5

Type :

conf

DOI :

10.1109/IPDPS.2010.5470455

Filename :

5470455

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2441604