Title :
Performance Traps in OpenCL for CPUs
Author :
Jie Shen ; Jianbin Fang ; Sips, Henk ; Varbanescu, Ana Lucia
Author_Institution :
Parallel & Distrib. Syst. Group, Delft Univ. of Technol., Delft, Netherlands
fDate :
Feb. 27 2013-March 1 2013
Abstract :
With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also on CPUs. Whether porting GPU programs to CPUs, or simply writing new code for CPUs, using OpenCL brings up the performance issue, usually raised in one of two forms: "OpenCL is not performance portable!" or "Why using OpenCL for CPUs after all?!". We argue that both issues can be addressed by a thorough study of the factors that impact the performance of OpenCL on CPUs. This analysis is the focus of this paper. Specifically, starting from the two main architectural mismatches between many-core CPUs and the OpenCL platform-parallelism granularity and the memory model-we identify eight such performance "traps" that lead to performance degradation in OpenCL for CPUs. Using multiple code examples, from both synthetic and real-life benchmarks, we quantify the impact of these traps, showing how avoiding them can give up to 10 times better performance. Furthermore, we point out that the solutions we provide for avoiding these traps are simple and generic code transformations, which can be easily adopted by either programmers or automated tools. Therefore, we conclude that a certain degree of OpenCL inter-platform performance portability, while indeed not a given, can be achieved by simple and generic code transformations.
Keywords :
electronic data interchange; graphics processing units; multiprocessing systems; open systems; parallel architectures; performance evaluation; program compilers; GPU program; OpenCL interplatform; cross-platform portability; generic code transformations; main architectural mismatches; many-core CPU; memory model; multiple code examples; parallelism granularity; performance degradation; performance traps; real-life benchmarks; Benchmark testing; Data transfer; Graphics processing units; Hardware; Kernel; Parallel processing; Performance evaluation; Many-core CPUs; OpenCL; Performance portability;
Conference_Titel :
Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on
Conference_Location :
Belfast
Print_ISBN :
978-1-4673-5321-2
Electronic_ISBN :
1066-6192
DOI :
10.1109/PDP.2013.16