• DocumentCode
    1858403
  • Title

    Optimizing SpMV for Diagonal Sparse Matrices on GPU

  • Author

    Sun, Xiangzheng ; Zhang, Yunquan ; Wang, Ting ; Zhang, Xianyi ; Yuan, Liang ; Rao, Li

  • fYear
    2011
  • fDate
    13-16 Sept. 2011
  • Firstpage
    492
  • Lastpage
    501
  • Abstract
    Sparse Matrix-Vector multiplication (SpMV) is an important computational kernel in scientific applications. Its performance highly depends on the nonzero distribution of sparse matrices. In this paper, we propose a new storage format for diagonal sparse matrices, defined as Compressed Row Segment with Diagonal-pattern (CRSD). In CRSD, we design diagonal patterns to represent the diagonal distribution. As the Graphics Processing Units (GPUs) have tremendous computation power and OpenCL makes them more suitable for the scientific computing, we implement the SpMV for CRSD format on the GPUs using OpenCL. Since the OpenCL kernels are complied at runtime, we design the code generator to produce the codelets for all diagonal patterns after storing matrices into CRSD format. Specifically, the generated codelets already contain the index information of nonzeros, which reduces the memory pressure during the SpMV operation. Furthermore, the code generator also utilizes property of memory architecture and thread schedule on the GPUs to improve the performance. In the evaluation, we select four storage formats from prior state-of-the-art implementations (Bell and Garland, 2009) on GPU. Experimental results demonstrate that the speedups reach up to 1.52 and 1.94 in comparison with the optimal implementation of the four formats for the double and single precision respectively. We also evaluate on a two-socket quad-core Intel Xeon system. The speedups reach up to 11.93 and 12.79 in comparison with CSR format under 8 threads for the double and single precision respectively.
  • Keywords
    computer graphic equipment; coprocessors; matrix multiplication; memory architecture; multiprocessing systems; sparse matrices; CRSD format; GPU; OpenCL kernel; SpMV; code generator; compressed row segment with diagonal-pattern; computation power; computational kernel; diagonal distribution; diagonal sparse matrix; graphics processing unit; memory architecture; nonzero distribution; scientific application; scientific computing; sparse matrix-vector multiplication; storage format; two-socket quad-core Intel Xeon system; Arrays; Copper; Graphics processing unit; Indexes; Instruction sets; Kernel; Sparse matrices;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2011 International Conference on
  • Conference_Location
    Taipei City
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4577-1336-1
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2011.53
  • Filename
    6047217