DocumentCode :
1827136
Title :
Fast Linear Algebra on GPU
Author :
Polok, Lukas ; Smrz, Pavel
Author_Institution :
IT4Innovations Centre of Excellence, Brno Univ. of Technol., Brno, Czech Republic
fYear :
2012
fDate :
25-27 June 2012
Firstpage :
439
Lastpage :
444
Abstract :
GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is a minimal size of primitives being handled in order to achieve significant speedups compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations to have sufficient amount of data to perform calculations maximally efficiently on the GPU. A fast OpenCL implementation of two basic vector functions-vector reduction and vector scaling-is described in this paper. Its performance is analyzed by running benchmarks on two of the most common GPUs in use-Tesla and Fermi NVIDIA GPUs. Reported experimental results show that our implementation significantly outperforms the current state-of-the-art GPUbased basic linear algebra library CUBLAS.
Keywords :
graphics processing units; linear algebra; parallel architectures; parallel languages; CUBLAS; GPU; OpenCL implementation; basic vector function; batching operation; linear algebra library; mathematical function; vector reduction; vector scaling; Benchmark testing; Graphics processing unit; Instruction sets; Kernel; Libraries; Memory management; Vectors; BLAS; CUDA; GPU; OpenCL; linear algebra; parallel reduction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
Conference_Location :
Liverpool
Print_ISBN :
978-1-4673-2164-8
Type :
conf
DOI :
10.1109/HPCC.2012.66
Filename :
6332205
Link To Document :
بازگشت