Title :
A Comparison of Performance Tunabilities between OpenCL and OpenACC
Author :
Sugawara, Mariko ; Hirasawa, Shoichi ; Komatsu, Kazuhiko ; Takizawa, Hiroyuki ; Kobayashi, Hideo
Author_Institution :
Tohoku Univ., Sendai, Japan
Abstract :
To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. This paper hence discusses the performance tunabilities of OpenACC and OpenCL. As OpenACC cannot synchronize threads running on GPUs, some important techniques are not available to OpenACC. Therefore, we also design an additional compiler directive for thread synchronization. Evaluation results show that both OpenCL and OpenACC need architecture-aware optimizations, and similar approaches to performance optimization are effective for both OpenCL and OpenACC. The additional directive can allow OpenACC to describe more tuning techniques in the same approach as OpenCL. As it is obvious that OpenACC is more productive than OpenCL especially for legacy application migration, OpenACC is a very promising programming model if it can achieve the same performance as the conventional GPU programming models such as CUDA and OpenCL.
Keywords :
graphics processing units; parallel architectures; software maintenance; CUDA; GPU programming models; OpenACC; OpenCL; architecture-aware optimizations; auto tuning mechanisms; compiler directive; legacy application migration; performance optimization; performance tunabilities; thread synchronization; Data transfer; Graphics processing units; Instruction sets; Kernel; Optimization; Programming; Synchronization; Autotuning; OpenACC; OpenCL;
Conference_Titel :
Embedded Multicore Socs (MCSoC), 2013 IEEE 7th International Symposium on
Conference_Location :
Tokyo
DOI :
10.1109/MCSoC.2013.31