Title :
A Performance and Energy Consumption Analytical Model for GPU
Author :
Luo, Cheng ; Suda, Reiji
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
Abstract :
Even with a powerful hardware in parallel execution, it is still difficult to improve the application performance and reduce energy consumption without realizing the performance bottlenecks of parallel programs on GPU architectures. To help programmers have a better insight into the performance and energy-saving bottleneck of parallel applications on GPU architectures, we propose two models: an execution time prediction model and an energy consumption prediction model. The execution time prediction model(ETPM) can estimate the execution time of massively parallel programs which take the instruction-level and thread-level parallelism into consideration. ETPM contains two components: memory sub-model and computation sub-model. The memory sub-model is estimating the cost of memory instructions by considering the number of active threads and GPU memory bandwidth. Correspondingly, the computation sub-model is estimating the cost of computation instructions by considering the number of active threads and the application´s arithmetic intensity. We use ocelot to analysis PTX codes to obtain several input parameters for the two sub-models such as the memory transaction number and data size. Basing on the two sub-models, the analytical model can estimates the cost of each instruction while considering instruction-level and thread-level parallelism, thereby estimating the overall execution time of an application. The energy consumption prediction model(ECPM) can estimate the total energy consumption basing on the data from ETPM. We compare the outcome from the models and the actual execution on GTX260 and Tesla C2050. The results show that the models can reach almost 90 percentage accuracy in average for the benchmarks we used.
Keywords :
costing; energy consumption; graphics processing units; instruction sets; memory architecture; multi-threading; parallel architectures; performance evaluation; ECPM; ETPM; GPU architectures; GPU memory bandwidth; PTX codes; active threads; application performance; arithmetic intensity; computation instructions; computation sub-model; cost estimation; data size; energy consumption analytical model; energy consumption prediction model; energy-saving bottleneck; execution time extimation; execution time prediction model; instruction-level parallelism; memory instructions; memory submodel; memory transaction number; ocelot; parallel applications; parallel execution; parallel programs; performance bottlenecks; powerful hardware; thread-level parallelism; Computational modeling; Energy consumption; Graphics processing unit; Instruction sets; Memory management; Power demand; Predictive models; GPU; energy; model; performance; prediction;
Conference_Titel :
Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4673-0006-3
DOI :
10.1109/DASC.2011.117