DocumentCode
162635
Title
Accelerating the general band matrix multiplication using graphics processors
Author
Benner, Peter ; Remon, Alfredo ; Dufrechou, Ernesto ; Ezzatti, Pablo ; Quintana-Orti, Enrique S.
Author_Institution
Max Planck Inst. for Dynamics of Complex Tech. Syst., Magdeburg, Germany
fYear
2014
fDate
15-19 Sept. 2014
Firstpage
1
Lastpage
7
Abstract
In this paper, we leverage the intrinsic data-parallelism of the band matrix-matrix product to accelerate this operation on Graphics Processing Units (GPUs). In particular, we propose a Level-3 BLAS style algorithm to tackle the band matrix-matrix product and implement two GPU-based versions that off-load the most expensive computations - i.e., general dense matrix-matrix multiplication, triangular matrixmatrix multiplication and matrix addition - to the hardware accelerator. Results collected using GPUs for the two most recent generations of NVIDIA (“Fermi” and “Kepler”) and a complete set of benchmark cases (which differ in the matrix dimensions and bandwidth) show that the GPU-enabled implementations deliver a notable reduction of the execution time.
Keywords
graphics processing units; mathematics computing; matrix multiplication; Fermi generations; GPU-based versions; GPU-enabled implementations; Kepler generations; Level-3 BLAS style algorithm; NVIDIA; band matrix-matrix product; general band matrix multiplication; general dense matrix-matrix multiplication; graphics processing units; graphics processors; hardware accelerator; intrinsic data-parallelism; matrix addition; triangular matrix-matrix multiplication; Acceleration; Bandwidth; Graphics processing units; Hardware; Kernel; Partitioning algorithms; Sparse matrices; BLAS; GPU; General Band Matrix Multiplication; LA-PACK;
fLanguage
English
Publisher
ieee
Conference_Titel
Computing Conference (CLEI), 2014 XL Latin American
Conference_Location
Montevideo
Type
conf
DOI
10.1109/CLEI.2014.6965142
Filename
6965142
Link To Document