DocumentCode
3682587
Title
Revealing Critical Loads and Hidden Data Locality in GPGPU Applications
Author
Gunjae Koo;Hyeran Jeon;Murali Annavaram
Author_Institution
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear
2015
Firstpage
120
Lastpage
129
Abstract
In graphics processing units (GPUs), memory access latency is one of the most critical performance hurdles. Several warp schedulers and memory prefetching algorithms have been proposed to avoid the long memory access latency. Prior application characterization studies shed light on the interaction between applications, GPU micro architecture and memory subsystem behavior. Most of these studies, however, only present aggregate statistics on how memory system behaves over the entire application run. In particular, they do not consider how individual load instructions in a program contribute to the aggregate memory system behavior. The analysis presented in this paper shows that there are two distinct classes of load instructions, categorized as deterministic and non-deterministic loads. Using a combination of profiling data from a real GPU card and cycle accurate simulation data we show that there is a significant performance impact disparity when executing these two types of loads. We discuss and suggest several approaches to treat these two load categories differently within the GPU micro architecture for optimizing memory system performance.
Keywords
"Graphics processing units","Instruction sets","Hardware","Registers","Image processing","Kernel","Microarchitecture"
Publisher
ieee
Conference_Titel
Workload Characterization (IISWC), 2015 IEEE International Symposium on
Type
conf
DOI
10.1109/IISWC.2015.23
Filename
7314158
Link To Document