DocumentCode :
3575192
Title :
On Implementing Sparse Matrix Multi-vector Multiplication on GPUs
Author :
Abu-Sufah, Walid ; Ahmad, Khalid
Author_Institution :
Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear :
2014
Firstpage :
1117
Lastpage :
1124
Abstract :
Sparse matrix-vector and multi-vector multiplications (SpMV and SpMM) are performance bottlenecks operations in numerous HPC applications. A variety of SpMV GPU kernels using different matrix storage formats have been developed to accelerate these applications. Unlike SpMV, where matrix elements are accessed only once, multiplying by k vectors requires accessing matrix elements k times. In this paper we explore the design of efficient GPU SpMM kernels that exploit two common matrix sparsity patterns, diagonal matrices and matrices with uniform row lengths. Our kernels use GPU registers to exploit the potential data reuse in SpMM. For evaluating the performance of our SpMM kernels we use 28 structured matrices and 29 matrices with uniform row lengths. Executing on the NVIDIAs Kepler-based Tesla K20 GPU and for structured grid matrices, the average speedup over the best performing state of the art SpMV kernel including NVIDIA´s kernels is 2.3x and the maximum is 4.6x. For unstructured mesh matrices, the average speedup is 2.6x and the maximum is 4.1x. Compared to NVIDIA´s cuSPARSE SpMM kernel the average speedup is 1.8x and the maximum is 2.4x for structured grid matrices. For unstructured mesh matrices, the average speedup is 1.5x and the maximum is 2.6x.
Keywords :
graphics processing units; mathematics computing; matrix multiplication; parallel processing; sparse matrices; vectors; GPU SpMM kernels; GPU registers; HPC applications; NVIDIA Kepler-based Tesla K20 GPU; SpMV GPU kernels; diagonal matrices; matrix elements; matrix sparsity pattern; matrix storage formats; performance evaluation; sparse matrix multivector multiplication; structured grid matrices; uniform row length matrices; unstructured mesh matrices; Arrays; Graphics processing units; Instruction sets; Kernel; Registers; Sparse matrices; Vectors; CUDA; CUSP; GPU; SpMM; SpMV; cuSPARSE; sparse linear algebra; structured grid; unstructured mesh;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
Print_ISBN :
978-1-4799-6122-1
Type :
conf
DOI :
10.1109/HPCC.2014.165
Filename :
7056883
Link To Document :
بازگشت