DocumentCode
3205233
Title
Automatic Library Generation for BLAS3 on GPUs
Author
Cui, Huimin ; Wang, Lei ; Xue, Jingling ; Yang, Yang ; Feng, Xiaobing
Author_Institution
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
fYear
2011
fDate
16-20 May 2011
Firstpage
255
Lastpage
265
Abstract
High-performance libraries, the performance-critical building blocks for high-level applications, will assume greater importance on modern processors as they become more complex and diverse. However, automatic library generators are still immature, forcing library developers to manually tune library to meet their performance objectives. We are developing a new script-controlled compilation framework to help domain experts reduce much of the tedious and error-prone nature of manual tuning, by enabling them to leverage their expertise and reuse past optimization experiences. We focus on demonstrating improved performance and productivity obtained through using our framework to tune BLAS3 routines on three GPU platforms: up to 5.4x speedups over the CUBLAS achieved on NVIDIA GeForce 9800, 2.8x on GTX285, and 3.4x on Fermi Tesla C2050. Our results highlight the potential benefits of exploiting domain expertise and the relations between different routines (in terms of their algorithms and data structures).
Keywords
automatic programming; computer graphic equipment; coprocessors; software libraries; BLAS3 library generator; Fermi Tesla C2050 GPU; GTX285 GPU; NVIDIA GeForce 9800 GPU; basic linear algebra subprograms; graphics processing unit; script-controlled compilation framework; Graphics processing unit; Instruction sets; Libraries; Optimization; Resource management; Symmetric matrices; Tuning;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International
Conference_Location
Anchorage, AK
ISSN
1530-2075
Print_ISBN
978-1-61284-372-8
Electronic_ISBN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2011.33
Filename
6012842
Link To Document