DocumentCode
2136356
Title
Automatic tuning matrix multiplication performance on graphics hardware
Author
Jiang, Changhao ; Snir, Marc
Author_Institution
Illinois Univ., Urbana-Champaign, IL, USA
fYear
2005
fDate
17-21 Sept. 2005
Firstpage
185
Lastpage
194
Abstract
In order to utilize the tremendous computing power of graphics hardware and to automatically adapt to the fast and frequent changes in its architecture and performance characteristics, this paper implements an automatic tuning system to generate high-performance matrix-multiplication implementation on graphics hardware. The automatic tuning system uses a parameterized code generator to generate multiple versions of matrix multiplication, whose performances are empirically evaluated by actual execution on the target platform. An ad-hoc search engine is employed to search over the implementation space for the version that yields the best performance. In contrast to similar systems on CPUs, which utilize cache blocking, register tiling, instruction scheduling tuning strategies, this paper identifies and exploits several tuning strategies that are unique for graphics hardware. These tuning strategies include optimizing for multiple-render-targets, SIMD instructions with data packing, overcoming limitations on instruction count and dynamic branch instruction. The generated implementations have comparable performance with expert manually tuned version in spite of the significant overhead incurred due to the use of the high-level BrookGPU language.
Keywords
computer graphic equipment; coprocessors; matrix multiplication; SIMD instructions; ad-hoc search engine; automatic tuning matrix multiplication; cache blocking; data packing; dynamic branch instruction; graphics hardware; instruction scheduling tuning; multiple-render-targets; parameterized code generator; register tiling; Central Processing Unit; Computer architecture; Graphics; Hardware; High performance computing; Pipeline processing; Programming profession; Signal processing algorithms; Software performance; Sparse matrices;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on
ISSN
1089-795X
Print_ISBN
0-7695-2429-X
Type
conf
DOI
10.1109/PACT.2005.10
Filename
1515592
Link To Document