DocumentCode :
726357
Title :
VWS: A versatile warp scheduler for exploring diverse cache localities of GPGPU applications
Author :
Mengjie Mao ; Jingtong Hu ; Yiran Chen ; Hai Li
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Pittsburgh, Pittsburgh, PA, USA
fYear :
2015
fDate :
8-12 June 2015
Firstpage :
1
Lastpage :
6
Abstract :
Massive multi-threading of GPGPU demands for efficient usage of caches with limited capacity. In this work, we propose a versatile warp scheduler (VWS) to reduce the cache miss rate in GPGPU. VWS retains the intra-warp cache locality using an efficient per-warp working set estimator and enhances intra-/inter-cooperative thread array (CTA) cache locality through imposing a CTA-aware scheduling policy and a new CTA dispatching mechanism. The significantly improved hit rate of cache hierarchy enables VWS to achieve on average 38.4% and 9.3% IPC improvement across diverse GPGPU applications compared to a widely-used and a state-of-the-art warp schedulers, respectively.
Keywords :
cache storage; graphics processing units; multi-threading; processor scheduling; CTA dispatching mechanism; CTA-aware scheduling policy; GPGPU application; IPC; VWS; cache hierarchy; cache locality; cache miss rate; intra-inter (CTA); intra-inter-cooperative thread array; multithreading; versatile warp scheduler; Dispatching; Hardware; Instruction sets; Kernel; Mathematical model; Radiation detectors; Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE
Conference_Location :
San Francisco, CA
Type :
conf
DOI :
10.1145/2744769.2744931
Filename :
7167267
Link To Document :
بازگشت