Title :
Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs
Author :
Feng, Xiaowen ; Jin, Hai ; Zheng, Ran ; Hu, Kan ; Zeng, Jingxiang ; Shao, Zhiyuan
Author_Institution :
Services Comput. Technol. & Syst. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Sparse Matrix-Vector multiplication (SpMV) is one of the most significant yet challenging issues in computational science area. It is a memory-bound application whose performance mostly depends on the input matrix and the underlying architecture. Many researchers have paid more attentions on exploring a variety of optimization techniques to SpMV. One of the most promising respects is how to adapt the storage format to satisfy the underlying architecture. Alterative storage formats can largely lessen memory pressure, however, the computational resources are often underutilized. Therefore, a new storage format, which is called Compressed Sparse Row with Segmented Interleave Combination (SIC), is proposed. Stemming from Compressed Sparse Row format (CSR), SIC format employs an interleave combination pattern that combines certain amount of CSR rows to form a new SIC row. In order to further improve performance, segmented processing is also brought in. According to the empirical data, we also develop an automatic SIC-based SpMV suitable for all the matrices. Experimental results show that our approach outperforms the NVIDIA CSR vector kernel, achieving up to 12.6 × speedup. It also demonstrates a comparable performance with the Hybrid format, even with the highest 2.89 × speedup.
Keywords :
graphics processing units; mathematics computing; matrix multiplication; optimisation; sparse matrices; GPU; compressed sparse row; computational science; memory-bound application; optimization; segmented interleave combination; segmented processing; sparse matrix-vector multiplication; storage format; variant CSR; Computer architecture; Graphics processing unit; Instruction sets; Kernel; Silicon carbide; Sparse matrices; Vectors; Compress Sparse Row; GPU; Interleaved Row Combination; Segmented Processing; Sparse Matrix-Vector Multiplication;
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on
Conference_Location :
Tainan
Print_ISBN :
978-1-4577-1875-5
DOI :
10.1109/ICPADS.2011.91