مرکز منطقه ای اطلاع رساني علوم و فناوري - Implementation of XcalableMP Device Acceleration Extention with OpenCL

DocumentCode :

2996330

Title :

Implementation of XcalableMP Device Acceleration Extention with OpenCL

Author :

Nomizu, Takuma ; Takahashi, Daisuke ; Lee, Jinpil ; Boku, Taisuke ; Sato, Mitsuhisa

Author_Institution :

Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan

fYear :

2012

fDate :

21-25 May 2012

Firstpage :

2394

Lastpage :

2403

Abstract :

Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL, these models remain difficult and complex. Furthermore, when programming for accelerator-enhanced clusters, we have to use an inter-node programming interface, such as MPI to coordinate the nodes. In order to address these problems and reduce complexity, an extension to XcalableMP (XMP), a PGAS language, for use on accelerator-enhanced clusters, called XcalableMP Device Acceleration Extension (XMP-dev), is proposed. In XMP-dev, a global distributed data is mapped onto distributed memory of each accelerator, and a fragment of codes can be of-floaded to execute in a set of accelerators. It eliminates the complex programming between nodes and accelerators and between nodes. In this paper, we present an implementation of the XMP-dev runtime library with the OpenCL APIs, while the previous implementation targets CUDA-only. Since OpenCL is a standardized interface supported for various kinds of accelerators, it improves the portability of XMP-dev and reduces the cost of development. In the result of performance evaluation, we show that the OpenCL implementation of XMP-dev can generate portable programs that can run on not only NVIDIA GPU-enhanced clusters but also various accelerator-enhanced clusters.

Keywords :

application program interfaces; graphics processing units; message passing; multiprocessing systems; parallel architectures; parallel programming; AMD APP; AMD Accelerated Parallel Processing; CUDA; Cell Broadband Engine; Cell/BE; MPI; NVIDIA GPU-enhanced cluster; OpenCL API; OpenCL implementation; PGAS language; XMP-dev portability; XMP-dev runtime library; XcalableMP Device Acceleration Extension; acceleration device; accelerator-enhanced cluster; code fragment; complexity reduction; computational performance; distributed memory; global distributed data mapping; high-performance computing; internode programming interface; multicore computing; node coordination; performance evaluation; programming accelerator; programming language; programming model; standardized interface; Acceleration; Arrays; Graphics processing unit; Kernel; Programming; Synchronization; Accelerator; Cluster; OpenCL;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0974-5

Type :

conf

DOI :

10.1109/IPDPSW.2012.296

Filename :

6270611

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2996330