DocumentCode :
2906687
Title :
Strassen´s Matrix Multiplication on GPUs
Author :
Li, Junjie ; Ranka, Sanjay ; Sahni, Sartaj
Author_Institution :
Dept. of Comput. & Inf. Sci. & Eng., Univ. of Florida, Gainesville, FL, USA
fYear :
2011
fDate :
7-9 Dec. 2011
Firstpage :
157
Lastpage :
164
Abstract :
We provide efficient single-precision and integer GPU implementations of Strassen´s algorithm as well as of Winograd´s variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen´s 4-level implementation and 33% (36%) for Winograd´s variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations.
Keywords :
computer graphic equipment; coprocessors; mathematics computing; matrix multiplication; CUBLAS 3.0; NVIDIA C1060 GPU; Strassen matrix multiplication; Winograd variant; integer GPU; sgemm; Complexity theory; Graphics processing unit; Kernel; Matrix decomposition; Memory management; Vectors; CUDA; GPU; Strassen´s algorithm; Winograd´s variant; accuracy; matrix multiplication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on
Conference_Location :
Tainan
ISSN :
1521-9097
Print_ISBN :
978-1-4577-1875-5
Type :
conf
DOI :
10.1109/ICPADS.2011.130
Filename :
6121273
Link To Document :
بازگشت