مرکز منطقه ای اطلاع رساني علوم و فناوري - Design and implementation of a parallel priority queue on many-core architectures

DocumentCode :

2037399

Title :

Design and implementation of a parallel priority queue on many-core architectures

Author :

Xi He ; Agarwal, Deborah ; Prasad, Sushil K.

Author_Institution :

Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA

fYear :

2012

fDate :

18-22 Dec. 2012

Firstpage :

Lastpage :

Abstract :

An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can provide powerful computing platform for such non-numeric computations if an efficient parallel priority queue implementation is available. In this paper, aiming at fine-grained applications, we develop an efficient parallel heap system employing CUDA. To our knowledge, this is the first parallel priority queue implementation on many-core architectures, thus represents a breakthrough. By allowing wide heap nodes to enable thousands of simultaneous deletions of highest priority items and insertions of new items, and taking full advantage of CUDA´s data parallel SIMT architecture, we demonstrate up to 30-fold absolute speedup for relatively fine-grained compute loads compared to optimized sequential priority queue implementation on fast multicores. Compared to this, our optimized multicore parallelization of parallel heap yields only 2-3 fold speedup for such fine-grained loads. This parallelization of a tree-based data structure on GPGPUs provides a roadmap for future parallelizations of other such data structures.

Keywords :

data structures; graphics processing units; multiprocessing systems; parallel architectures; queueing theory; CUDA; GPGPU; branch-and-bound algorithms; data parallel SIMT architecture; data structures; discrete event simulation scheduling; fast multicores; fine-grained applications; fine-grained compute loads; fine-grained loads; many-core architectures; nonnumeric computations; nonnumeric irregular computations; optimized multicore parallelization; optimized sequential priority queue implementation; parallel heap system; parallel priority queue design; parallel priority queue implementation; tree-based data structure; wide heap nodes;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing (HiPC), 2012 19th International Conference on

Conference_Location :

Pune

Print_ISBN :

978-1-4673-2372-7

Electronic_ISBN :

978-1-4673-2370-3

Type :

conf

DOI :

10.1109/HiPC.2012.6507490

Filename :

6507490

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2037399