DocumentCode :
1917823
Title :
GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing
Author :
Odajima, Tetsuya ; Boku, Taisuke ; Hanawa, Toshihiro ; Lee, Jinpil ; Sato, Mitsuhisa
Author_Institution :
Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan
fYear :
2012
fDate :
10-13 Sept. 2012
Firstpage :
97
Lastpage :
106
Abstract :
In this paper, we propose a solution framework to enable the work sharing of parallel processing by the coordination of CPUs and GPUs on hybrid PC clusters based on the high-level parallel language XcalableMPdev. Basic XcalableMP enables high-level parallel programming using sequential code directives that support data distribution and loop/task distribution among multiple nodes on a PC cluster. XcalableMP-dev is an extension of XcalableMP for a hybrid PC cluster, where each node is equipped with accelerated computing devices such as GPUs, many-core environments, etc. Our new framework proposed here, named XcalableMP-dev/Star PU, enables the distribution of data and loop execution among multiple GPUs and multiple CPU cores on each node. We employ a Star PU run-time system for task management with dynamic load balancing. Because of the large performance gap between CPUs and GPUs, the key issue for work sharing among CPU and GPU resources is the task size control assigned to different devices. Since the compiler of the new system is still under construction, we evaluated the performance of hybrid work sharing among four nodes of a GPU cluster and confirmed that the performance gain by the traditional XcalableMP-dev system on NVIDIA CUDA is up to 1.4 times faster than GPU-only execution.
Keywords :
graphics processing units; multiprocessing systems; parallel architectures; parallel programming; performance evaluation; resource allocation; CPU; GPU; NVIDIA CUDA; Star PU run-time system; XcalableMP-dev high-level parallel language; data distribution; dynamic load balancing; high-level parallel programming; hybrid PC clusters; loop distribution; loop execution; many-core environments; parallel processing; parallelized accelerated computing; performance evaluation; sequential code directives; task distribution; task management; task size control; Acceleration; Arrays; Distributed databases; Graphics processing unit; Multicore processing; Performance evaluation; Synchronization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICPPW), 2012 41st International Conference on
Conference_Location :
Pittsburgh, PA
ISSN :
1530-2016
Print_ISBN :
978-1-4673-2509-7
Type :
conf
DOI :
10.1109/ICPPW.2012.16
Filename :
6337468
Link To Document :
بازگشت