مرکز منطقه ای اطلاع رساني علوم و فناوري - A detailed GPU cache model based on reuse distance theory

DocumentCode :

157807

Title :

A detailed GPU cache model based on reuse distance theory

Author :

Nugteren, Cedric ; van den Braak, Gert-Jan ; Corporaal, Henk ; Bal, Henri

Author_Institution :

Eindhoven Univ. of Technol., Eindhoven, Netherlands

fYear :

2014

fDate :

15-19 Feb. 2014

Firstpage :

Lastpage :

Abstract :

As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, the efficient use of their caches has become important for performance and energy. However, optimising cache locality system-atically requires insight into and prediction of cache behaviour. On sequential processors, stack distance or reuse distance theory is a well-known means to model cache behaviour. However, it is not straightforward to apply this theory to GPUs, mainly because of the parallel execution model and fine-grained multi-threading. This work extends reuse distance to GPUs by modelling: (1) the GPU´s hierarchy of threads, warps, threadblocks, and sets of active threads, (2) conditional and non-uniform latencies, (3) cache associativity, (4) miss-status holding-registers, and (5) warp divergence. We implement the model in C++ and extend the Ocelot GPU emulator to extract lists of memory addresses. We compare our model with measured cache miss rates for the Parboil and PolyBench/GPU benchmark suites, showing a mean absolute error of 6% and 8% for two cache configurations. We show that our model is faster and even more accurate compared to the GPGPU-Sim simulator.

Keywords :

C++ language; benchmark testing; cache storage; graphics processing units; multi-threading; storage allocation; C++ language; GPU cache model; Ocelot GPU emulator; Parboil benchmark suites; PolyBench/GPU benchmark suites; active thread hierarchy; cache associativity; cache behaviour prediction; cache configurations; cache locality optimisation; cache miss rates; conditional nonuniform latencies; fine-grained multithreading; graphics processing units; mean absolute error; memory address list extraction; miss-status holding-registers; parallel execution model; reuse distance theory; sequential processors; stack distance; thread hierarchy; threadblock hierarchy; warp divergence; warp hierarchy; Computer architecture; Data models; Graphics processing units; Instruction sets; Kernel; System-on-chip;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on

Conference_Location :

Orlando, FL

Type :

conf

DOI :

10.1109/HPCA.2014.6835955

Filename :

6835955

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=157807