مرکز منطقه ای اطلاع رساني علوم و فناوري - A quantitative performance analysis model for GPU architectures

DocumentCode :

2947347

Title :

A quantitative performance analysis model for GPU architectures

Author :

Zhang, Yao ; Owens, John D.

Author_Institution :

Dept. of Electr. & Comput. Eng., Univ. of California, Davis, CA, USA

fYear :

2011

fDate :

12-16 Feb. 2011

Firstpage :

382

Lastpage :

393

Abstract :

We develop a microbenchmark-based performance model for NVIDIA GeForce 200-series GPUs. Our model identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural improvements. In particular, we use a microbenchmark-based approach to develop a throughput model for three major components of GPU execution time: the instruction pipeline, shared memory access, and global memory access. Because our model is based on the GPU´s native instruction set, we can predict performance with a 5-15% error. To demonstrate the usefulness of the model, we analyze three representative real-world and already highly-optimized programs: dense matrix multiply, tridiagonal systems solver, and sparse matrix vector multiply. The model provides us detailed quantitative analysis on performance, allowing us to understand the configuration of the fastest dense matrix multiply implementation and to optimize the tridiagonal solver and sparse matrix vector multiply by 60% and 18% respectively. Furthermore, our model applied to analysis on these codes allows us to suggest architectural improvements on hardware resource allocation, avoiding bank conflicts, block scheduling, and memory transaction granularity.

Keywords :

computer graphic equipment; coprocessors; performance evaluation; GPU architectures; NVIDIA GeForce 200-series GPU; dense matrix multiply; global memory access; hardware resource allocation; instruction pipeline; microbenchmark based approach; quantitative performance analysis model; shared memory access; sparse matrix vector multiply; tridiagonal systems solver; Analytical models; Bandwidth; Computational modeling; Graphics processing unit; Instruction sets; Pipelines; Throughput;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on

Conference_Location :

San Antonio, TX

ISSN :

1530-0897

Print_ISBN :

978-1-4244-9432-3

Type :

conf

DOI :

10.1109/HPCA.2011.5749745

Filename :

5749745

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2947347