Title : 
An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
         
        
            Author : 
Abu-Sufah, Walid ; Karim, A.A.
         
        
            Author_Institution : 
Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
         
        
        
        
        
        
            Abstract : 
Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties of the BTJAD storage format and reuses loaded values of the source vector in the registers of a GPU. Using 62 matrices with different sparsity patterns and executing on an NVIDIA Tesla T10 GPU, we compare the performance of our kernel with that of the SpMV kernels in NVIDIA´s library. Our kernel achieves superior execution throughputs for matrices that are non-uniform in their nonzero row lengths, outperforming the best available kernels by up to 4.67x. When executing on the Fermi class GeForce GTX480 GPU which has a larger register file size, the maximum speedup achieved by our kernel improves to 6.6x.
         
        
            Keywords : 
graphics processing units; iterative methods; matrix multiplication; optimisation; parallel architectures; sparse matrices; BTJAD storage format; Fermi class GeForce GTX480 GPU; NVIDIA Tesla T10; SpMV kernel; graphics processing unit; iterative solver; optimization; sparse matrix vector multiplication; sparsity pattern; transposed jagged diagonal storage format; Graphics processing unit; Instruction sets; Kernel; Optimization; Registers; Sparse matrices; Vectors; CUDA; GPU; SpMV; sparse linear algebra;
         
        
        
        
            Conference_Titel : 
High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
         
        
            Conference_Location : 
Liverpool
         
        
            Print_ISBN : 
978-1-4673-2164-8
         
        
        
            DOI : 
10.1109/HPCC.2012.68