Efficient GPU Spatial-Temporal Multitasking

Author

Yun Liang ; Huynh, Huynh Phung ; Rupnow, Kyle ; Goh, Rick Siow Mong ; Deming Chen

Author_Institution

Center for Energy-Efficient Comput. & Applic., Peking Univ., Beijing, China

Volume

26

Issue

3

fYear

2015

fDate

Mar-15

Firstpage

748

Lastpage

760

Abstract

Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a leading computing device for application acceleration. GPUs have tremendous computing potential for data-parallel applications, and the emergence of GPUs has led to proliferation of GPU-accelerated applications. This proliferation has also led to systems in which many applications are competing for access to GPU resources, and efficient utilization of the GPU resources is critical to system performance. Prior techniques of temporal multitasking can be employed with GPU resources as well, but not all GPU kernels make full use of the GPU resources. There is, therefore, an unmet need for spatial multitasking in GPUs. Resources used inefficiently by one kernel can be instead assigned to another kernel that can more effectively use the resources. In this paper we propose a software-hardware solution for efficient spatial-temporal multitasking and a software based emulation framework for our system. We pair an efficient heuristic in software with hardware leaky-bucket based thread-block interleaving to implement spatial-temporal multitasking. We demonstrate our techniques on various GPU architecture using nine representative benchmarks from CUDA SDK. Our experiments on Fermi GTX480 demonstrate performance improvement by up to 46% (average 26%) over sequential GPU task execution and 37% (average 18%) over default concurrent multitasking. Compared with the state-of-the-art Kepler K20 using Hyper-Q technology, our technique achieves up to 40% (average 17%) performance improvement over default concurrent multitasking.

Keywords

graphics processing units; multiprocessing systems; multiprogramming; parallel architectures; CUDA SDK; Fermi GTX480; GPU architecture; GPU spatial-temporal multitasking; Hyper-Q technology; Kepler K20; default concurrent multitasking; hardware leaky-bucket; sequential GPU task execution; software based emulation framework; thread-block interleaving; Bandwidth; Graphics processing units; Instruction sets; Kernel; Multitasking; Resource management; Schedules; GPU; multitasking; resource allocation; spatial; temporal;

fLanguage

English

Journal_Title

Parallel and Distributed Systems, IEEE Transactions on

Publisher

ieee

ISSN

1045-9219

Type

jour

DOI

10.1109/TPDS.2014.2313342

Filename

6777559