DocumentCode :
1933560
Title :
Automatic data placement into GPU on-chip memory resources
Author :
Chao Li ; Yi Yang ; Zhen Lin ; Huiyang Zhou
fYear :
2015
fDate :
7-11 Feb. 2015
Firstpage :
23
Lastpage :
33
Abstract :
Although graphics processing units (GPUs) rely on thread-level parallelism to hide long off-chip memory access latency, judicious utilization of on-chip memory resources, including register files, shared memory, and data caches, is critical to application performance. However, explicitly managing GPU on-chip memory resources is a non-trivial task for application developers. More importantly, as on-chip memory resources vary among different GPU generations, performance portability has become a daunting challenge. In this paper, we tackle this problem with compiler-driven automatic data placement. We focus on programs that have already been reasonably optimized either manually by programmers or automatically by compiler tools. Our proposed compiler algorithms refine these programs by revising data placement across different types of GPU on-chip resources to achieve both performance enhancement and performance portability. Among 12 benchmarks in our study, our proposed compiler algorithm improves the performance by 1.76× on average on Nvidia GTX480, and by 1.61× on average on GTX680.
Keywords :
graphics processing units; performance evaluation; shared memory systems; GPU onchip memory resources; Nvidia GTX480; automatic data placement; compiler algorithm; data caches; graphics processing units; offchip memory access; onchip memory resources; performance enhancement; performance portability; register files; shared memory; Arrays; Bandwidth; Graphics processing units; Instruction sets; Registers; System-on-chip;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Code Generation and Optimization (CGO), 2015 IEEE/ACM International Symposium on
Conference_Location :
San Francisco, CA
Type :
conf
DOI :
10.1109/CGO.2015.7054184
Filename :
7054184
Link To Document :
بازگشت