DocumentCode :
1878341
Title :
Parallel distributed breadth first search on GPU
Author :
Ueno, K. ; Suzumura, Toyotaro
Author_Institution :
JST CREST, Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2013
fDate :
18-21 Dec. 2013
Firstpage :
314
Lastpage :
323
Abstract :
In this paper we propose a highly optimized parallel and distributed BFS on GPU for Graph500 benchmark. We evaluate the performance of our implementation using TSUBAME2.0 supercomputer. We achieve 317 GTEPS (billion traversed edges per second) with scale 35 (a large graph with 34.4 billion vertices and 550 billion edges) using 1366 nodes and 4096 GPUs. With this score, TSUBAME2.0 supercomputer is ranked fourth in the ranking list announced in June 2012. We analyze the performance of our implementation and the result shows that inter-node communication limits the performance of our GPU implementation. We also propose SIMD Variable-Length Quantity (VLQ) encoding for compression of communication data with GPU.
Keywords :
data compression; encoding; graphics processing units; parallel machines; performance evaluation; tree searching; variable length codes; GPU; GTEPS; Graph500 benchmark; SIMD variable-length quantity encoding; TSUBAME2.0 supercomputer; VLQ encoding; communication data compression; distributed BFS; inter-node communication; parallel BFS; parallel distributed breadth first search; performance evaluation; Algorithm design and analysis; Benchmark testing; Encoding; Graphics processing units; Kernel; Partitioning algorithms; BFS; GPU; Graph500; Supercomputer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing (HiPC), 2013 20th International Conference on
Conference_Location :
Bangalore
Type :
conf
DOI :
10.1109/HiPC.2013.6799136
Filename :
6799136
Link To Document :
بازگشت