Dynamic Task Scheduling Scheme for a GPGPU Programming Framework

Author

Kazuhiko Ohno;Rei Yamamoto

Author_Institution

Dept. of Inf. Eng., Mie Univ., Tsu, Japan

fYear

2015

Firstpage

181

Lastpage

187

Abstract

The computational power and the physical memory size of a single GPU device are often insufficient for large-scale problems. Using CUDA, the user must explicitly partition such problems into several tasks repeating the data transfer and kernel execution. To use multiple GPUs, explicit device switching is also needed. Furthermore, low-level hand optimizations such as load balancing and determining task granularity are required to achieve high performance. To handle large-scale problems without any additional user code, we introduce an implicit dynamic task scheduling scheme to our CUDA variation MESI-CUDA. MESI-CUDA is designed to abstract the low-level GPU features, virtual shared variables and logical thread mappings hide the complex memory hierarchy and physical characteristics. On the other hand, explicit parallel execution using kernel functions is the same as in CUDA. In our scheme, each kernel invocation in the user code is translated into a job submission to the runtime scheduler. The scheduler partitions a job into tasks considering the device memory size and dynamically schedules them to the available GPU devices. Thus the user can simply specify kernel invocations independent of the execution environment. The evaluation result shows that our scheme can automatically utilize heterogeneous GPU devices with small overhead.

Keywords

"Graphics processing units","Kernel","Instruction sets","Data transfer","Performance evaluation","Optimization","Dynamic scheduling"

Publisher

ieee

Conference_Titel

Computing and Networking (CANDAR), 2015 Third International Symposium on

Electronic_ISBN

2379-1896

Type

conf

DOI

10.1109/CANDAR.2015.103

Filename

7424708