On Analyzing Large Graphs Using GPUs

Author

Chatterjee, Avhishek ; Radhakrishnan, S. ; Antonio, John K.

Author_Institution

Sch. of Comput. Sci., Univ. of Oklahoma, Norman, OK, USA

fYear

2013

fDate

20-24 May 2013

Firstpage

751

Lastpage

760

Abstract

Studying properties of graphs is essential to various applications, and recent growth of online social networks has spurred interests in analyzing their structures using Graphical Processing Units (GPUs). Utilizing the faster available shared memory on GPUs have provided tremendous speed-up for solving many general-purpose problems. However, when data required for processing is large and needs to be stored in the global memory instead of the shared memory, simultaneous memory accesses by threads in execution becomes the bottleneck for achieving higher throughput. In this paper, for storing large graphs, we propose and evaluate techniques to efficiently utilize the different levels of the memory hierarchy of GPUs, with the focus being on the larger global memory. Given a graph G = (V, E), we provide an algorithm to count the number of triangles in G, while storing the adjacency information on the global memory. Our computation techniques and data structure for retrieving the adjacency information is derived from processing the breadth-first-search tree of the input graph. Also, techniques to generate combinations of nodes for testing the properties of graphs induced by the same are discussed in detail. Our methods can be extended to solve other combinatorial counting problems on graphs, such as finding the number of connected sub graphs of size k, number of cliques (resp. independent sets) of size k, and related problems for large data sets. In the context of the triangle counting algorithm, we analyze and utilize primitives such as memory access coalescing and avoiding partition camping that offset the increase in access latency of using a slower but larger global memory. Our experimental results for the GPU implementation show at least 10 times speedup for triangle counting over the CPU counterpart. Another 6 - 8% increase in performance is obtained by utilizing the above mentioned primitives as compared to the naïve implementation of the program on the G- U.

Keywords

data structures; graph theory; shared memory systems; tree searching; GPU; breadth-first-search tree; combinatorial counting problems; data structure; general-purpose problems; global memory; graphical processing units; large graphs; memory hierarchy; online social networks; shared memory; simultaneous memory accesses; triangle counting algorithm; Context; Data structures; Graphics processing units; Instruction sets; Partitioning algorithms; Social network services; Testing; CUDA; GPU optimization; Triangle counting; global memory; graph problems;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International

Conference_Location

Cambridge, MA

Print_ISBN

978-0-7695-4979-8

Type

conf

DOI

10.1109/IPDPSW.2013.235

Filename

6650952