• DocumentCode
    3575192
  • Title

    On Implementing Sparse Matrix Multi-vector Multiplication on GPUs

  • Author

    Abu-Sufah, Walid ; Ahmad, Khalid

  • Author_Institution
    Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2014
  • Firstpage
    1117
  • Lastpage
    1124
  • Abstract
    Sparse matrix-vector and multi-vector multiplications (SpMV and SpMM) are performance bottlenecks operations in numerous HPC applications. A variety of SpMV GPU kernels using different matrix storage formats have been developed to accelerate these applications. Unlike SpMV, where matrix elements are accessed only once, multiplying by k vectors requires accessing matrix elements k times. In this paper we explore the design of efficient GPU SpMM kernels that exploit two common matrix sparsity patterns, diagonal matrices and matrices with uniform row lengths. Our kernels use GPU registers to exploit the potential data reuse in SpMM. For evaluating the performance of our SpMM kernels we use 28 structured matrices and 29 matrices with uniform row lengths. Executing on the NVIDIAs Kepler-based Tesla K20 GPU and for structured grid matrices, the average speedup over the best performing state of the art SpMV kernel including NVIDIA´s kernels is 2.3x and the maximum is 4.6x. For unstructured mesh matrices, the average speedup is 2.6x and the maximum is 4.1x. Compared to NVIDIA´s cuSPARSE SpMM kernel the average speedup is 1.8x and the maximum is 2.4x for structured grid matrices. For unstructured mesh matrices, the average speedup is 1.5x and the maximum is 2.6x.
  • Keywords
    graphics processing units; mathematics computing; matrix multiplication; parallel processing; sparse matrices; vectors; GPU SpMM kernels; GPU registers; HPC applications; NVIDIA Kepler-based Tesla K20 GPU; SpMV GPU kernels; diagonal matrices; matrix elements; matrix sparsity pattern; matrix storage formats; performance evaluation; sparse matrix multivector multiplication; structured grid matrices; uniform row length matrices; unstructured mesh matrices; Arrays; Graphics processing units; Instruction sets; Kernel; Registers; Sparse matrices; Vectors; CUDA; CUSP; GPU; SpMM; SpMV; cuSPARSE; sparse linear algebra; structured grid; unstructured mesh;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), 2014 IEEE Intl Conf on
  • Print_ISBN
    978-1-4799-6122-1
  • Type

    conf

  • DOI
    10.1109/HPCC.2014.165
  • Filename
    7056883