• DocumentCode
    50322
  • Title

    BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures

  • Author

    Busato, Federico ; Bombieri, Nicola

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Verona, Verona, Italy
  • Volume
    26
  • Issue
    7
  • fYear
    2015
  • fDate
    July 1 2015
  • Firstpage
    1826
  • Lastpage
    1838
  • Abstract
    Breadth-first search (BFS) is one of the most common graph traversal algorithms and the building block for a wide range of graph applications. With the advent of graphics processing units (GPUs), several works have been proposed to accelerate graph algorithms and, in particular, BFS on such many-core architectures. Nevertheless, BFS has proven to be an algorithm for which it is hard to obtain better performance from parallelization. Indeed, the proposed solutions take advantage of the massively parallelism of GPUs but they are often asymptotically less efficient than the fastest CPU implementations. This paper presents BFS-4K, a parallel implementation of BFS for GPUs that exploits the more advanced features of GPU-based platforms (i.e., NVIDIA Kepler) and that achieves an asymptotically optimal work complexity. The paper presents different strategies implemented in BFS-4K to deal with the potential workload imbalance and thread divergence caused by any actual graph non-homogeneity. The paper presents the experimental results conducted on several graphs of different size and characteristics to understand how the proposed techniques are applied and combined to obtain the best performance from the parallel BFS visits. Finally, an analysis of the most representative BFS implementations for GPUs at the state of the art and their comparison with BFS-4K are reported to underline the efficiency of the proposed solution.
  • Keywords
    graphics processing units; multiprocessing systems; tree searching; BFS-4K implementation; GPU parallelism; Kepler GPU architectures; breadth-first search; graph traversal algorithms; graphics processing unit; many-core architectures; thread divergence; workload imbalance; Complexity theory; Computer architecture; Graphics processing units; Image edge detection; Instruction sets; Kernel; Parallel processing; BFS; CUDA; GPU; Kepler; Parallel graph algorithms;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2330597
  • Filename
    6832649