Author :
Jianbin Fang ; Sips, Henk ; Jaaskelainen, Pekka ; Varbanescu, Ana Lucia
Abstract :
Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on a CPU. To address this unpredictability, we propose an empirical approach: by disabling the use of local memory in OpenCL kernels, we enable users to compare the kernel versions with and without local memory, and further choose the best performing version for a given platform. To this end, we have designed Grover, a method to automatically remove local memory usage from OpenCL kernels. In particular, we create a correspondence between the global and local memory spaces, which is used to replace local memory accesses by global memory accesses. We have implemented this scheme in the LLVM framework as a compiling pass, which automatically transforms an OpenCL kernel with local memory to a version without it. We have validated Grover with 11 applications, and found that it can successfully disable local memory usage for all of them. We have compared the kernels with and without local memory on three different processors, and found performance improvements for more than a third of the test cases after Grover disabled local memory usage. We conclude that such a compiler pass can be beneficial for performance, and, because it is fully automated, it can be used as an auto-tuning step for OpenCL kernels.
Keywords :
application program interfaces; memory architecture; multiprocessing systems; operating system kernels; program compilers; CPU; GPU; Grover; LLVM framework; OpenCL kernels; application memory access patterns; automatic local memory usage removal; autotuning process; compiling pass; disable local memory usage; empirical approach; global memory access; global memory space; local memory access; local memory space; performance improvement; performance losses; processor architectures; Data structures; Equations; Hardware; Indexes; Instruction sets; Kernel; Memory management; Local Memory; OpenCL; Reverse Engineering;