Parallel Matrix-Matrix Multiplication Based on HPL with a GPU-Accelerated PC Cluster

Author

Wang, Qin ; Ohmura, Junichi ; Axida, Shan ; Miyoshi, Takefumi ; Irie, Hidetsugu ; Yoshinaga, Tsutomu

Author_Institution

Dept. of Inf. Network Syst., Univ. of Electro-Commun., Chofu, Japan

fYear

2010

fDate

17-19 Nov. 2010

Firstpage

243

Lastpage

248

Abstract

In this paper, we propose an approach for significantly improving the performance of parallel matrix-matrix multiplication using a GPU-accelerated cluster. For one node, we implement a CPUs-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation and achieve a performance improvement of 32% as compared to the GPU-only case and 56% as compared to the CPUs-only case. For the entire cluster, we use the overlap GPU acceleration solution to high-performance Linpack (HPL), which eliminates the close dependency between the LU decomposition and the dgemm operation, and achieve a performance improvement of 5.72% as compared to the flat GPU acceleration case.

Keywords

computer graphic equipment; coprocessors; matrix multiplication; parallel programming; CPU; GPU accelerated PC cluster; HPL; LU decomposition; dgemm operation; high-performance Linpack; parallel matrix-matrix multiplication; performance improvement; GPU; MPI; cluster; heterogeneous; matrix-multiplier; parallelization;

fLanguage

English

Publisher

ieee

Conference_Titel

Networking and Computing (ICNC), 2010 First International Conference on

Conference_Location

Higashi-Hiroshima

Print_ISBN

978-1-4244-8918-3

Electronic_ISBN

978-0-7695-4277-5

Type

conf

DOI

10.1109/IC-NC.2010.39

Filename

5695242