DocumentCode :
167360
Title :
A Linear Performance-Breakdown Model for GPU Programming Optimization Guidance
Author :
Chapa M, Mario A. ; Hiroyuki, Sato
Author_Institution :
Dept. of Electr. Eng. & Inf. Sci., Univ. of Tokyo, Tokyo, Japan
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
596
Lastpage :
603
Abstract :
The use Graphic Processing Units (GPU) as computing accelerators has been. Nevertheless, writing efficient GPU programs is a difficult and time consuming task. In this paper we present the Linear Performance Breakdown Model (LBPM), an analytic model that is used to extract the breakdown of GPU kernel programs execution time into the three major components that affect its running time. The model can be used as a tool to provide guidelines to detect the performance bottlenecks. Our approach is the incorporation of three elements, the Global-to-Shared Memory Time slice, Shared-to-Private Time slice and Processing Units Time slice. These three factors are integrated into a performance model formula by applying the Normalized Least Squares Method (NLSM). The resulting coefficients are used to construct a performance breakdown graph that reveals the effects of each element in the total execution time of the kernel program. We demonstrate the results obtained with our proposed model with two common numeric routines: Single-Precision General Matrix Multiplication (SGMM) and Fast Fourier Transform (FFT), and apply the model to the results obtained from two GPU devices: A8-3870 AMD Accelerated Processing Unit (APU) and a GTX 660 Nvidia GPU.
Keywords :
fast Fourier transforms; graph theory; graphics processing units; least squares approximations; matrix multiplication; shared memory systems; software performance evaluation; A8-3870 AMD accelerated processing unit; APU; FFT; GPU devices; GPU kernel program execution; GPU programming optimization guidance; GTX 660 Nvidia GPU; LBPM; NLSM; SGMM; analytic model; computing accelerators; fast Fourier transform; global-to-shared memory time slice; graphic processing units; kernel program; linear performance-breakdown model; normalized least squares method; performance breakdown graph; processing unit time slice; shared-to-private time slice; single-precision general matrix multiplication; time consuming task; Computational modeling; Computer architecture; Graphics processing units; Kernel; Performance evaluation; Programming; Registers; GPGPU; Modeling; OpenCL;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
Type :
conf
DOI :
10.1109/IPDPSW.2014.70
Filename :
6969440
Link To Document :
بازگشت