Title :
Increasing GPU throughput using kernel interleaved thread block scheduling
Author :
Awatramani, Mihir ; Zambreno, Joseph ; Rover, Diane
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
Abstract :
The number of active threads required to achieve peak application throughput on graphics processing units (GPUs) depends largely on the ratio of time spent on computation to the time spent accessing data from memory. While compute-intensive applications can achieve peak throughput with a low number of threads, memory-intensive applications might not achieve good throughput even at the maximum supported thread count. In this paper, we study the effects of scheduling work from multiple applications on the same GPU core. We claim that interleaving workload from different applications on a GPU core can improve the utilization of computational units and reduce the load on memory subsystem. Experiments on 17 application pairs from the Rodinia benchmark suite show that overall throughput increases by 7% on average.
Keywords :
benchmark testing; graphics processing units; interleaved storage; multi-threading; operating system kernels; scheduling; GPU core; GPU throughput; Rodinia benchmark suite; active threads; computational unit utilization; compute-intensive applications; graphics processing units; interleaving workload; kernel interleaved thread block scheduling; maximum supported thread count; memory data access; memory subsystem load reduction; peak application throughput; scheduling work; Computer architecture; Graphics processing units; Instruction sets; Kernel; Message systems; Processor scheduling; Throughput; Concurrent Kernel Execution; GPGPU; Load Balancing; Thread Block Scheduling;
Conference_Titel :
Computer Design (ICCD), 2013 IEEE 31st International Conference on
Conference_Location :
Asheville, NC
DOI :
10.1109/ICCD.2013.6657093