DocumentCode :
228659
Title :
Scalable Kernel Fusion for Memory-Bound GPU Applications
Author :
Wahib, Mohamed ; Maruyama, Naoya
Author_Institution :
CREST, RIKEN Adv. Inst. for Comput. Sci., Kobe, Japan
fYear :
2014
fDate :
16-21 Nov. 2014
Firstpage :
191
Lastpage :
202
Abstract :
GPU implementations of HPC applications relying on finite difference methods can include tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing data traffic to off-chip memory, kernels that share data arrays are fused to larger kernels where on-chip cache is used to hold the data reused by instructions originating from different kernels. The main challenges are a) searching for the optimal kernel fusions while constrained by data dependencies and kernels´ precedences and b) effectively applying kernel fusion to achieve speedup. This paper introduces a problem definition and proposes a scalable method for searching the space of possible kernel fusions to identify optimal kernel fusions for large problems. The paper also proposes a codeless performance upper-bound projection model to achieve effective fusions. Results show that using the proposed scalable method for kernel fusion improved the performance of two real-world applications containing tens of kernels by 1.35x and 1.2x.
Keywords :
cache storage; finite difference methods; graphics processing units; parallel processing; performance evaluation; HPC applications; codeless performance upper-bound projection model; data arrays; data dependencies; data traffic; finite difference methods; kernel precedences; memory-bound GPU applications; memory-bound kernels; off-chip memory; on-chip cache; optimal kernel fusions; scalable kernel fusion; Arrays; Graphics processing units; Instruction sets; Kernel; Meteorology; Optimization; System-on-chip;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4799-5499-5
Type :
conf
DOI :
10.1109/SC.2014.21
Filename :
7013003
Link To Document :
بازگشت