DocumentCode :
3089397
Title :
Reuse and Refactoring of GPU Kernels to Design Complex Applications
Author :
Sarkar, Santonu ; Mitra, Sayantan ; Srinivasan, Ashok
Author_Institution :
Infosys Labs., Infosys Ltd., Bangalore, India
fYear :
2012
fDate :
10-13 July 2012
Firstpage :
134
Lastpage :
141
Abstract :
Developers of GPU kernels, such as FFT, linear solvers, etc, tune their code extensively in order to obtain optimal performance, making efficient use of different resources available on the GPU. Complex applications are composed of several such kernel components. The software engineering community has performed extensive research on component based design to build generic and flexible components, such that a component can be reused across diverse applications, rather than optimizing its performance. Since a GPU is used primarily to improve performance, application performance becomes a key design issue. The contribution of our work lies in extending component based design research in a new direction, dealing with the performance impact of refactoring an application consisting of the composition of highly tuned kernels. Such refactoring can make the composition more effective with respect to GPU resource usage especially when combined with suitable scheduling. Here we propose a methodology where developers of highly tuned kernels can enable application designers to optimize performance of the composition. Kernel developers characterize the performance of a kernel through its "performance signature". The application designer combines these kernels such that the performance of the refactored kernel is better than the sum of the performances of the individual kernels. This is partly based on the observation that different kernels may make unbalanced use of different GPU resources like different types of memory. Kernels may also have the potential to share data. Refactoring the kernels, combining them, and scheduling them suitably can improve performance. We study different types of potential design optimizations and evaluate their effectiveness on different types of kernels. This may even involve choosing non-optimal parameters for an individual kernel. We analyze how the performance signature of the composition changes from that of the individual kernels through our t- chniques. We demonstrate that our techniques lead to over 50% improvement with some kernels. Furthermore, the performance of a basic molecular dynamics application can be improved by around 25.7%, on a Fermi GPU, compared with an un-refactored implementation.
Keywords :
object-oriented programming; physics computing; scheduling; software maintenance; software performance evaluation; software reusability; Fermi GPU; GPU kernel refactoring; GPU kernel reusability; GPU resource usage; application performance; complex applications design; component-based design; data sharing; design issue; design optimization; flexible components; generic components; graphics processing units; kernel scheduling; molecular dynamics application; nonoptimal parameters; performance improvement; performance optimization; performance signature; software engineering community; Design methodology; Graphics processing unit; Hardware; Instruction sets; Kernel; Merging; Registers; component reuse; gpu; kernel composition; refactoring;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on
Conference_Location :
Leganes
Print_ISBN :
978-1-4673-1631-6
Type :
conf
DOI :
10.1109/ISPA.2012.26
Filename :
6280285
Link To Document :
بازگشت