DocumentCode
1996102
Title
On Analyzing Large Graphs Using GPUs
Author
Chatterjee, Avhishek ; Radhakrishnan, S. ; Antonio, John K.
Author_Institution
Sch. of Comput. Sci., Univ. of Oklahoma, Norman, OK, USA
fYear
2013
fDate
20-24 May 2013
Firstpage
751
Lastpage
760
Abstract
Studying properties of graphs is essential to various applications, and recent growth of online social networks has spurred interests in analyzing their structures using Graphical Processing Units (GPUs). Utilizing the faster available shared memory on GPUs have provided tremendous speed-up for solving many general-purpose problems. However, when data required for processing is large and needs to be stored in the global memory instead of the shared memory, simultaneous memory accesses by threads in execution becomes the bottleneck for achieving higher throughput. In this paper, for storing large graphs, we propose and evaluate techniques to efficiently utilize the different levels of the memory hierarchy of GPUs, with the focus being on the larger global memory. Given a graph G = (V, E), we provide an algorithm to count the number of triangles in G, while storing the adjacency information on the global memory. Our computation techniques and data structure for retrieving the adjacency information is derived from processing the breadth-first-search tree of the input graph. Also, techniques to generate combinations of nodes for testing the properties of graphs induced by the same are discussed in detail. Our methods can be extended to solve other combinatorial counting problems on graphs, such as finding the number of connected sub graphs of size k, number of cliques (resp. independent sets) of size k, and related problems for large data sets. In the context of the triangle counting algorithm, we analyze and utilize primitives such as memory access coalescing and avoiding partition camping that offset the increase in access latency of using a slower but larger global memory. Our experimental results for the GPU implementation show at least 10 times speedup for triangle counting over the CPU counterpart. Another 6 - 8% increase in performance is obtained by utilizing the above mentioned primitives as compared to the naïve implementation of the program on the G- U.
Keywords
data structures; graph theory; shared memory systems; tree searching; GPU; breadth-first-search tree; combinatorial counting problems; data structure; general-purpose problems; global memory; graphical processing units; large graphs; memory hierarchy; online social networks; shared memory; simultaneous memory accesses; triangle counting algorithm; Context; Data structures; Graphics processing units; Instruction sets; Partitioning algorithms; Social network services; Testing; CUDA; GPU optimization; Triangle counting; global memory; graph problems;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location
Cambridge, MA
Print_ISBN
978-0-7695-4979-8
Type
conf
DOI
10.1109/IPDPSW.2013.235
Filename
6650952
Link To Document