Accelerating the general band matrix multiplication using graphics processors

Author

Benner, Peter ; Remon, Alfredo ; Dufrechou, Ernesto ; Ezzatti, Pablo ; Quintana-Orti, Enrique S.

Author_Institution

Max Planck Inst. for Dynamics of Complex Tech. Syst., Magdeburg, Germany

fYear

2014

fDate

15-19 Sept. 2014

Firstpage

1

Lastpage

7

Abstract

In this paper, we leverage the intrinsic data-parallelism of the band matrix-matrix product to accelerate this operation on Graphics Processing Units (GPUs). In particular, we propose a Level-3 BLAS style algorithm to tackle the band matrix-matrix product and implement two GPU-based versions that off-load the most expensive computations - i.e., general dense matrix-matrix multiplication, triangular matrixmatrix multiplication and matrix addition - to the hardware accelerator. Results collected using GPUs for the two most recent generations of NVIDIA (“Fermi” and “Kepler”) and a complete set of benchmark cases (which differ in the matrix dimensions and bandwidth) show that the GPU-enabled implementations deliver a notable reduction of the execution time.

Keywords

graphics processing units; mathematics computing; matrix multiplication; Fermi generations; GPU-based versions; GPU-enabled implementations; Kepler generations; Level-3 BLAS style algorithm; NVIDIA; band matrix-matrix product; general band matrix multiplication; general dense matrix-matrix multiplication; graphics processing units; graphics processors; hardware accelerator; intrinsic data-parallelism; matrix addition; triangular matrix-matrix multiplication; Acceleration; Bandwidth; Graphics processing units; Hardware; Kernel; Partitioning algorithms; Sparse matrices; BLAS; GPU; General Band Matrix Multiplication; LA-PACK;

fLanguage

English

Publisher

ieee

Conference_Titel

Computing Conference (CLEI), 2014 XL Latin American

Conference_Location

Montevideo

Type

conf

DOI

10.1109/CLEI.2014.6965142

Filename

6965142