Title :
Accurately modeling the GPU memory subsystem
Author :
Candel, Francisco ; Petit, Salvador ; Sahuquillo, Julio ; Duato, Jose
Author_Institution :
Dept. of Comput. Eng., Univ. Politec. de Valencia, Valencia, Spain
Abstract :
Nowadays, research on GPU processor architecture is extraordinarily active since these architectures offer much more performance per watt than CPU architectures. This is the main reason why massive deployment of GPU multiprocessors is considered one of the most feasible solutions to attain exascale computing capabilities. In this context, ongoing GPU architecture research is required to improve GPU programmability as well as to integrate CPU and GPU cores in the same die. One of the most important research topics in current GPUs, is the GPU memory hierarchy, since its design goals are very different from those of conventional CPU memory hierarchies. To explore novel designs to better support General Purpose computing in GPUs (GPGPU computing) as well as to improve the performance of GPU and CPU/GPU systems, researchers often require advanced microarchitectural simulators with detailed models of the memory subsystem. Nevertheless, due to fast speed at which current GPU architectures evolve, simulation accuracy of existing state-of-the-art simulators suffers. This paper focuses on accurately modeling the GPU memory subsystem. We identified three main aspects that should be modeled with more accuracy: i) miss status holding registers, ii) coalescing vector memory requests, and iii) non-blocking GPU stores. In this sense, we extend the Multi2Sim heterogeneous CPU/GPU processor simulator to model these aspects with enough accuracy. Experimental results show that if these aspects are not considered in the simulation framework, performance deviations can rise in some applications up to 70%, 75%, and 60%, respectively.
Keywords :
graphics processing units; memory architecture; multiprocessing systems; CPU memory hierarchies; GPGPU computing; GPU cores; GPU memory hierarchy; GPU memory subsystem modeling; GPU multiprocessors; GPU processor architecture; GPU programmability; Multi2Sim heterogeneous CPU-GPU processor simulator; advanced microarchitectural simulators; coalescing vector memory requests; exascale computing capabilities; general purpose computing; miss status holding registers; nonblocking GPU stores; Computational modeling; Computer architecture; Graphics processing units; Load modeling;
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2015 International Conference on
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4673-7812-3
DOI :
10.1109/HPCSim.2015.7237038