DocumentCode :
48986
Title :
Efficient GPU Spatial-Temporal Multitasking
Author :
Yun Liang ; Huynh, Huynh Phung ; Rupnow, Kyle ; Goh, Rick Siow Mong ; Deming Chen
Author_Institution :
Center for Energy-Efficient Comput. & Applic., Peking Univ., Beijing, China
Volume :
26
Issue :
3
fYear :
2015
fDate :
Mar-15
Firstpage :
748
Lastpage :
760
Abstract :
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a leading computing device for application acceleration. GPUs have tremendous computing potential for data-parallel applications, and the emergence of GPUs has led to proliferation of GPU-accelerated applications. This proliferation has also led to systems in which many applications are competing for access to GPU resources, and efficient utilization of the GPU resources is critical to system performance. Prior techniques of temporal multitasking can be employed with GPU resources as well, but not all GPU kernels make full use of the GPU resources. There is, therefore, an unmet need for spatial multitasking in GPUs. Resources used inefficiently by one kernel can be instead assigned to another kernel that can more effectively use the resources. In this paper we propose a software-hardware solution for efficient spatial-temporal multitasking and a software based emulation framework for our system. We pair an efficient heuristic in software with hardware leaky-bucket based thread-block interleaving to implement spatial-temporal multitasking. We demonstrate our techniques on various GPU architecture using nine representative benchmarks from CUDA SDK. Our experiments on Fermi GTX480 demonstrate performance improvement by up to 46% (average 26%) over sequential GPU task execution and 37% (average 18%) over default concurrent multitasking. Compared with the state-of-the-art Kepler K20 using Hyper-Q technology, our technique achieves up to 40% (average 17%) performance improvement over default concurrent multitasking.
Keywords :
graphics processing units; multiprocessing systems; multiprogramming; parallel architectures; CUDA SDK; Fermi GTX480; GPU architecture; GPU spatial-temporal multitasking; Hyper-Q technology; Kepler K20; default concurrent multitasking; hardware leaky-bucket; sequential GPU task execution; software based emulation framework; thread-block interleaving; Bandwidth; Graphics processing units; Instruction sets; Kernel; Multitasking; Resource management; Schedules; GPU; multitasking; resource allocation; spatial; temporal;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2014.2313342
Filename :
6777559
Link To Document :
بازگشت