DocumentCode
74605
Title
Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
Author
Gangwon Jo ; Jeongho Nah ; Jun Lee ; Jungwon Kim ; Jaejin Lee
Author_Institution
Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
Volume
26
Issue
7
fYear
2015
fDate
July 1 2015
Firstpage
1814
Lastpage
1825
Abstract
OpenCL is an open standard to write parallel applications for heterogeneous computing systems. Since its usage is restricted to a single operating system instance, programmers need to use a mix of OpenCL and MPI to program a heterogeneous cluster. In this paper, we introduce an MPI-OpenCL implementation of the LINPACK benchmark for a cluster with multi-GPU nodes. The LINPACK benchmark is one of the most widely used benchmark applications for evaluating high performance computing systems. Our implementation is based on High Performance LINPACK (HPL) and uses the blocked LU decomposition algorithm. We address that optimizations aimed at reducing the overhead of CPUs are necessary to overcome the performance gap between the CPUs and the multiple GPUs. Our LINPACK implementation achieves 93.69 Tflops (46 percent of the theoretical peak) on the target cluster with 49 nodes, each node containing two eight-core CPUs and four GPUs.
Keywords
graphics processing units; message passing; parallel processing; HPL; LINPACK benchmark; MPI-OpenCL; blocked LU decomposition algorithm; eight-core CPU; heterogeneous computing systems; high performance LINPACK; high performance computing systems; multiGPU node clusters; open standard; Benchmark testing; Clustering algorithms; Graphics processing units; Kernel; Matrix decomposition; Optimization; Programming; Cluster; GPU; OpenCL; heterogeneous computing; high performance LINPACK;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2014.2321742
Filename
6846313
Link To Document