DocumentCode :
2074510
Title :
Register and thread structure optimization for GPUs
Author :
Yun Liang ; Zheng Cui ; Rupnow, Kyle ; Deming Chen
Author_Institution :
Center for Energy-Efficient Comput. & Applic., Peking Univ., Beijing, China
fYear :
2013
fDate :
22-25 Jan. 2013
Firstpage :
461
Lastpage :
466
Abstract :
GPUs are an increasingly popular implementation platform for a variety of general purpose applications from mobile and embedded devices to high performance computing. The CUDA and OpenCL parallel programming models enable easy utilization of the GPU´s resources. However, tuning GPU applications´ performance is a complex and labor intensive task. Software programmers employ a variety of optimization techniques to explore tradeoffs between the thread parallelism and performance of a single thread. However, prior techniques ignore register allocation, a significant factor in single thread performance and, indirectly affects the number of simultaneously active threads. In this paper, we show that joint optimization of register allocation and thread structure has great potential to significantly improve performance. However, the design space for this joint optimization can be large; therefore, we develop performance metrics appropriate for evaluation within a compiler´s inner loop and efficient design space exploration techniques that use the metrics to narrow the search space. Across a range of GPU applications, we achieve average performance speedup of 1.33X (up to 1.73X) with design space exploration 355X faster than the exhaustive search.
Keywords :
embedded systems; graphics processing units; logic design; optimisation; CUDA; GPU applications; OpenCL parallel programming models; design space exploration techniques; embedded devices; general purpose applications; high performance computing; mobile devices; performance metrics; register allocation; register structure optimization; search space; software programmers; thread parallelism; thread structure optimization; Graphics processing units; Instruction sets; Kernel; Measurement; Registers; Resource management; Space exploration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design Automation Conference (ASP-DAC), 2013 18th Asia and South Pacific
Conference_Location :
Yokohama
ISSN :
2153-6961
Print_ISBN :
978-1-4673-3029-9
Type :
conf
DOI :
10.1109/ASPDAC.2013.6509639
Filename :
6509639
Link To Document :
بازگشت