• DocumentCode
    162635
  • Title

    Accelerating the general band matrix multiplication using graphics processors

  • Author

    Benner, Peter ; Remon, Alfredo ; Dufrechou, Ernesto ; Ezzatti, Pablo ; Quintana-Orti, Enrique S.

  • Author_Institution
    Max Planck Inst. for Dynamics of Complex Tech. Syst., Magdeburg, Germany
  • fYear
    2014
  • fDate
    15-19 Sept. 2014
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    In this paper, we leverage the intrinsic data-parallelism of the band matrix-matrix product to accelerate this operation on Graphics Processing Units (GPUs). In particular, we propose a Level-3 BLAS style algorithm to tackle the band matrix-matrix product and implement two GPU-based versions that off-load the most expensive computations - i.e., general dense matrix-matrix multiplication, triangular matrixmatrix multiplication and matrix addition - to the hardware accelerator. Results collected using GPUs for the two most recent generations of NVIDIA (“Fermi” and “Kepler”) and a complete set of benchmark cases (which differ in the matrix dimensions and bandwidth) show that the GPU-enabled implementations deliver a notable reduction of the execution time.
  • Keywords
    graphics processing units; mathematics computing; matrix multiplication; Fermi generations; GPU-based versions; GPU-enabled implementations; Kepler generations; Level-3 BLAS style algorithm; NVIDIA; band matrix-matrix product; general band matrix multiplication; general dense matrix-matrix multiplication; graphics processing units; graphics processors; hardware accelerator; intrinsic data-parallelism; matrix addition; triangular matrix-matrix multiplication; Acceleration; Bandwidth; Graphics processing units; Hardware; Kernel; Partitioning algorithms; Sparse matrices; BLAS; GPU; General Band Matrix Multiplication; LA-PACK;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing Conference (CLEI), 2014 XL Latin American
  • Conference_Location
    Montevideo
  • Type

    conf

  • DOI
    10.1109/CLEI.2014.6965142
  • Filename
    6965142