Title :
A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs
Author :
Karami, Armine ; Mirsoleimani, Sayyed Ali ; Khunjush, Farshad
Author_Institution :
Sch. of Electr. & Comput. Eng., Shiraz Univ., Shiraz, Iran
Abstract :
Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.
Keywords :
graphics processing units; parallel architectures; principal component analysis; regression analysis; GPU architectures; GPU parameters; GPU performance counters; GPU programming; NVIDIA GPU; OpenCL kernels; OpenCL programming standard; branch divergence; high performance computing; memory access latency; parallelism; principle component analysis; profiler; regression model; statistical performance model; statistical performance prediction model; Analytical models; Benchmark testing; Graphics processing units; Kernel; Predictive models; Principal component analysis; Programming; GPU; OpenCL; Statistical Performance Model;
Conference_Titel :
Computer Architecture and Digital Systems (CADS), 2013 17th CSI International Symposium on
Conference_Location :
Tehran
Print_ISBN :
978-1-4799-0562-1
DOI :
10.1109/CADS.2013.6714232