DocumentCode :
1925545
Title :
GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture
Author :
Junfeng Zhu ; Gang Chen ; Baifeng Wu
Author_Institution :
Inst. of Comput. Sci. & Technol., Fudan Univ., Shanghai, China
fYear :
2012
fDate :
24-28 Sept. 2012
Firstpage :
449
Lastpage :
458
Abstract :
The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order to fully exploit the capability of GPU for general purpose computing under heterogeneous processing platforms, we propose performance estimation and optimization methods targeting OpenCL architecture. Our approach is to utilize polyhedral representation of a source program in order to optimize and allocate global memory and fast memory of GPUs. By checking the memory access patterns of the program, we discover access instances those can be grouped together using graph coloring. Subsequently, we estimate the memory performance of this program, with the purpose of eliminating the uncoalesced global memory accesses. Then, we utilize data space transformation to alter the irregular memory access patterns for the sake of improving the off-chip memory bandwidth by taking advantage of vector data types. Meanwhile, we detect the reuse information to allocate data into distinct fast memory regions according to both the properties of data accesses and the characteristics of the OpenCL memory model, with the purpose of making best usage of the fast on-chip memory. Experimental results on an AMD/ATI HD5850 GPU for a set of commonly-used benchmarks show that we achieve 2.1X~6.7X speedup with respect to the un-optimized versions and the present global memory performance model can estimate the global memory performance relative accurately.
Keywords :
graph colouring; graphics processing units; parallel architectures; storage management; AMD/ATI HD5850 GPU; GPGPU memory estimation; OpenCL memory model; computational power; data accesses; data allocation; data space transformation; fast memory regions; fast on-chip memory; global memory allocation; global memory optimization; global memory performance model; graph coloring; graphics processing units; heterogeneous processing platforms; high-performance parallel codes; memory access patterns; memory performance estimatino; off-chip memory bandwidth; optimization methods; optimization targeting OpenCL architecture; performance estimation; source program polyhedral representation; uncoalesced global memory accesses; vector data types; Arrays; Bandwidth; Graphics processing unit; Memory management; Optimization; Vectors; GPGPU; GPU; OpenCL architecture; heterogeneous processing; memory estimation and optimization; polyhedral model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2422-9
Type :
conf
DOI :
10.1109/CLUSTER.2012.9
Filename :
6337808
Link To Document :
بازگشت