DocumentCode
50322
Title
BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures
Author
Busato, Federico ; Bombieri, Nicola
Author_Institution
Dept. of Comput. Sci., Univ. of Verona, Verona, Italy
Volume
26
Issue
7
fYear
2015
fDate
July 1 2015
Firstpage
1826
Lastpage
1838
Abstract
Breadth-first search (BFS) is one of the most common graph traversal algorithms and the building block for a wide range of graph applications. With the advent of graphics processing units (GPUs), several works have been proposed to accelerate graph algorithms and, in particular, BFS on such many-core architectures. Nevertheless, BFS has proven to be an algorithm for which it is hard to obtain better performance from parallelization. Indeed, the proposed solutions take advantage of the massively parallelism of GPUs but they are often asymptotically less efficient than the fastest CPU implementations. This paper presents BFS-4K, a parallel implementation of BFS for GPUs that exploits the more advanced features of GPU-based platforms (i.e., NVIDIA Kepler) and that achieves an asymptotically optimal work complexity. The paper presents different strategies implemented in BFS-4K to deal with the potential workload imbalance and thread divergence caused by any actual graph non-homogeneity. The paper presents the experimental results conducted on several graphs of different size and characteristics to understand how the proposed techniques are applied and combined to obtain the best performance from the parallel BFS visits. Finally, an analysis of the most representative BFS implementations for GPUs at the state of the art and their comparison with BFS-4K are reported to underline the efficiency of the proposed solution.
Keywords
graphics processing units; multiprocessing systems; tree searching; BFS-4K implementation; GPU parallelism; Kepler GPU architectures; breadth-first search; graph traversal algorithms; graphics processing unit; many-core architectures; thread divergence; workload imbalance; Complexity theory; Computer architecture; Graphics processing units; Image edge detection; Instruction sets; Kernel; Parallel processing; BFS; CUDA; GPU; Kepler; Parallel graph algorithms;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2014.2330597
Filename
6832649
Link To Document