مرکز منطقه ای اطلاع رساني علوم و فناوري - On Implementing Sparse Matrix Multi-vector Multiplication on GPUs

DocumentCode :

3575192

Title :

On Implementing Sparse Matrix Multi-vector Multiplication on GPUs

Author :

Abu-Sufah, Walid ; Ahmad, Khalid

Author_Institution :

Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA

fYear :

2014

Firstpage :

1117

Lastpage :

1124

Abstract :

Sparse matrix-vector and multi-vector multiplications (SpMV and SpMM) are performance bottlenecks operations in numerous HPC applications. A variety of SpMV GPU kernels using different matrix storage formats have been developed to accelerate these applications. Unlike SpMV, where matrix elements are accessed only once, multiplying by k vectors requires accessing matrix elements k times. In this paper we explore the design of efficient GPU SpMM kernels that exploit two common matrix sparsity patterns, diagonal matrices and matrices with uniform row lengths. Our kernels use GPU registers to exploit the potential data reuse in SpMM. For evaluating the performance of our SpMM kernels we use 28 structured matrices and 29 matrices with uniform row lengths. Executing on the NVIDIAs Kepler-based Tesla K20 GPU and for structured grid matrices, the average speedup over the best performing state of the art SpMV kernel including NVIDIA´s kernels is 2.3x and the maximum is 4.6x. For unstructured mesh matrices, the average speedup is 2.6x and the maximum is 4.1x. Compared to NVIDIA´s cuSPARSE SpMM kernel the average speedup is 1.8x and the maximum is 2.4x for structured grid matrices. For unstructured mesh matrices, the average speedup is 1.5x and the maximum is 2.6x.

Keywords :

graphics processing units; mathematics computing; matrix multiplication; parallel processing; sparse matrices; vectors; GPU SpMM kernels; GPU registers; HPC applications; NVIDIA Kepler-based Tesla K20 GPU; SpMV GPU kernels; diagonal matrices; matrix elements; matrix sparsity pattern; matrix storage formats; performance evaluation; sparse matrix multivector multiplication; structured grid matrices; uniform row length matrices; unstructured mesh matrices; Arrays; Graphics processing units; Instruction sets; Kernel; Registers; Sparse matrices; Vectors; CUDA; CUSP; GPU; SpMM; SpMV; cuSPARSE; sparse linear algebra; structured grid; unstructured mesh;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on

Print_ISBN :

978-1-4799-6122-1

Type :

conf

DOI :

10.1109/HPCC.2014.165

Filename :

7056883

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3575192