DocumentCode :
1954803
Title :
OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures
Author :
Mudalige, G.A. ; Giles, M.B. ; Reguly, I. ; Bertolli, C. ; Kelly, P.H.J.
Author_Institution :
Oxford eResearch Centre, Univ. of Oxford, Oxford, UK
fYear :
2012
fDate :
13-14 May 2012
Firstpage :
1
Lastpage :
12
Abstract :
OP2 is an “active” library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into different parallel implementations for execution on different back-end hardware platforms. In this paper we present the design of the current OP2 library, and investigate its capabilities in achieving performance portability, near-optimal performance, and scaling on modern multi-core and many-core processor based systems. A key feature of this work is OP2´s recent extension facilitating the development and execution of applications on a distributed memory cluster of GPUs. We discuss the main design issues in parallelizing unstructured mesh based applications on heterogeneous platforms. These include handling data dependencies in accessing indirectly referenced data, the impact of unstructured mesh data layouts (array of structs vs. struct of arrays) and design considerations in generating code for execution on a cluster of GPUs. A representative CFD application written using the OP2 framework is utilized to provide a contrasting benchmarking and performance analysis study on a range of multi-core/many-core systems. These include multi-core CPUs from Intel (Westmere and Sandy Bridge) and AMD (Magny-Cours), GPUs from NVIDIA (GTX560Ti, Tesla C2070), a distributed memory CPU cluster (Cray XE6) and a distributed memory GPU cluster (Tesla C2050 GPUs with InfiniBand). OP2´s design choices are explored with quantitative insights into their contributions to performance. We demonstrate that an application written once at a high-level using the OP2 API can be easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
Keywords :
aerodynamics; application program interfaces; computational fluid dynamics; data handling; distributed memory systems; graphics processing units; libraries; mesh generation; parallel processing; program compilers; AMD; CFD application; OP2 API; OP2 library; active library framework; back-end hardware platforms; compilation; data dependency handling; distributed memory CPU cluster; distributed memory GPU cluster; heterogeneous platforms; many-core architectures; multicore CPU; multicore architectures; near-optimal performance; parallel implementations; performance portability; source-to-source translation; unstructured mesh data layout impact; unstructured mesh-based applications; Abstracts; Active Library; Domain Specific Language; GPU; OP2; Unstructured mesh;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Parallel Computing (InPar), 2012
Conference_Location :
San Jose, CA
Print_ISBN :
978-1-4673-2632-2
Electronic_ISBN :
978-1-4673-2631-5
Type :
conf
DOI :
10.1109/InPar.2012.6339594
Filename :
6339594
Link To Document :
بازگشت