DocumentCode :
2996330
Title :
Implementation of XcalableMP Device Acceleration Extention with OpenCL
Author :
Nomizu, Takuma ; Takahashi, Daisuke ; Lee, Jinpil ; Boku, Taisuke ; Sato, Mitsuhisa
Author_Institution :
Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
2394
Lastpage :
2403
Abstract :
Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL, these models remain difficult and complex. Furthermore, when programming for accelerator-enhanced clusters, we have to use an inter-node programming interface, such as MPI to coordinate the nodes. In order to address these problems and reduce complexity, an extension to XcalableMP (XMP), a PGAS language, for use on accelerator-enhanced clusters, called XcalableMP Device Acceleration Extension (XMP-dev), is proposed. In XMP-dev, a global distributed data is mapped onto distributed memory of each accelerator, and a fragment of codes can be of-floaded to execute in a set of accelerators. It eliminates the complex programming between nodes and accelerators and between nodes. In this paper, we present an implementation of the XMP-dev runtime library with the OpenCL APIs, while the previous implementation targets CUDA-only. Since OpenCL is a standardized interface supported for various kinds of accelerators, it improves the portability of XMP-dev and reduces the cost of development. In the result of performance evaluation, we show that the OpenCL implementation of XMP-dev can generate portable programs that can run on not only NVIDIA GPU-enhanced clusters but also various accelerator-enhanced clusters.
Keywords :
application program interfaces; graphics processing units; message passing; multiprocessing systems; parallel architectures; parallel programming; AMD APP; AMD Accelerated Parallel Processing; CUDA; Cell Broadband Engine; Cell/BE; MPI; NVIDIA GPU-enhanced cluster; OpenCL API; OpenCL implementation; PGAS language; XMP-dev portability; XMP-dev runtime library; XcalableMP Device Acceleration Extension; acceleration device; accelerator-enhanced cluster; code fragment; complexity reduction; computational performance; distributed memory; global distributed data mapping; high-performance computing; internode programming interface; multicore computing; node coordination; performance evaluation; programming accelerator; programming language; programming model; standardized interface; Acceleration; Arrays; Graphics processing unit; Kernel; Programming; Synchronization; Accelerator; Cluster; OpenCL;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
Type :
conf
DOI :
10.1109/IPDPSW.2012.296
Filename :
6270611
Link To Document :
بازگشت