DocumentCode :
1791591
Title :
Large-scale distributed sorting for GPU-based heterogeneous supercomputers
Author :
Shamoto, Hideyuki ; Shirahata, Koichi ; Drozd, A. ; Sato, Hikaru ; Matsuoka, Shingo
Author_Institution :
Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
510
Lastpage :
518
Abstract :
Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms on large-scale heterogeneous GPU-based systems remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. We also handle GPU memory overflows by introducing an iterative approach which sorts multiple chunks and merges them into one array. We evaluate the performance of our implementation with local sort acceleration on the TSUBAME2.5 supercomputer that comprises over 4000 NVIDIA K20x GPUs. Performance evaluation of weak scaling shows that we achieve 389 times speedup with 0.25TB/s throughput when sorting 4TB 64bit integer on 1024 nodes compared to running on 1 node; on the other hand, for CPU vs. GPU comparison, our implementation achieves only 1.40 times speedup using 1024 nodes. Detailed analysis however reveals that the limitation is almost entirely due to the bottleneck in CPU-GPU host-to-device bandwidth. With orders of magnitude improvements planned for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.
Keywords :
graphics processing units; parallel machines; sorting; 4TB 64bit integer; CPU-GPU host-to-device bandwidth; GPU accelerations; GPU accelerators; GPU memory overflows; GPU-based heterogeneous supercomputers; GPU-based systems; HykSort; NVIDIA K20x GPU; TSUBAME2.5 supercomputer; computation phases; large-scale distributed sorting; splitter-based parallel sorting algorithms; Algorithm design and analysis; Arrays; Data transfer; Graphics processing units; Histograms; Performance evaluation; Sorting; Big Data Applications; Distributed Systems; GPGPU; Hybrid Programming; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004268
Filename :
7004268
Link To Document :
بازگشت