DocumentCode
2996259
Title
MSSM: An Efficient Scheduling Mechanism for CUDA Basing on Task Partition
Author
Cheng Luo ; Suda, Ryutaro
Author_Institution
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan
fYear
2012
fDate
17-19 Dec. 2012
Firstpage
548
Lastpage
555
Abstract
This paper presents a multiple stream scheduling mechanism to enable parallel execution of kernels, data sending from host to device and data receiving from device to host with multiple streams in CUDA. Our mechanism can divide the kernels and bi-directional data transmission into small subtasks, and allow to easily and efficiently overlap them on the CUDA compatible graphic processing unit(GPU). To set the optimal subtask size, we have built one compute bound model for computing intensive application and one data bound model for bi-directional data transmission intensive application. Basing on the two models, we also provided three scheduling algorithms for data dependent and data independent applications to maximize the efficiency of the overlap. We have applied the mechanism to a set of benchmarks to understand the performance. The results show that our work can successfully hide the latency to achieve high performance which is very close to the optimal.
Keywords
graphics processing units; parallel architectures; processor scheduling; CUDA; GPU; MSSM; bidirectional data transmission; efficient scheduling mechanism; graphic processing unit; kernel data transmission; multiple stream scheduling mechanism; task partition; Bidirectional control; Data transfer; Dynamic scheduling; Equations; Graphics processing units; Kernel; Mathematical model; CUDA; GPU; Stream scheduling; parallelism; subtask overlap;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on
Conference_Location
Singapore
ISSN
1521-9097
Print_ISBN
978-1-4673-4565-1
Electronic_ISBN
1521-9097
Type
conf
DOI
10.1109/ICPADS.2012.80
Filename
6414456
Link To Document