Multi-science applications with single codebase — GAMER — For massively parallel architectures

Author

Shukla, Hemant ; Schive, Hsi-Yu ; Woo, Tak-Pong ; Chiueh, Tzihong

Author_Institution

Lawrence Berkeley Nat. Lab., Berkeley, CA, USA

fYear

2011

fDate

12-18 Nov. 2011

Firstpage

Lastpage

Abstract

The growing need for power efficient extreme-scale high-performance computing (HPC) coupled with plateauing clock-speeds is driving the emergence of massively parallel compute architectures. Tens to many hundreds of cores are increasingly made available as compute units, either as the integral part of the main processor or as coprocessors designed for handling massively parallel workloads. In the case of many-core graphics processing units (GPUs) hundreds of SIMD cores primarily designed for image and video rendering are used for high-performance scientific computations. The new architectures typically offer ANSI standard programming models such as CUDA (NVIDIA) and OpenCL. However, the wide-ranging adoption of these parallel architectures is steeped in difficult learning curve and requires reengineering of existing applications that mostly leads to expensive and error prone code rewrites without prior guarantee and knowledge of any speedups. Broad range of complex scientific applications across many domains use common algorithms and techniques, such as adaptive mesh refinements (AMR), advanced hydrodynamics partial differential equation (PDE) solvers, Poisson-Gravity solvers etc, that have demonstrably performed highly efficiently on GPU based systems. Taking advantage of the commonalities, we use GPU-aware AMR code, GAMER [1], to examine the unique approach of solving multi-science problems in astrophysics, hydrodynamics and particle physics with single codebase. We demonstrate significant speedups in disparate class of scientific applications on 3 separate clusters, viz., Dirac, Laohu and Mole 8.5. By extensively reusing the extendable single codebase we mitigate the impediments of significant code rewrites. We also collect performance and energy consumption benchmark metrics on 50-nodes NVIDIA C2050 GPU and Intel 8-core Nehalem CPU on Dirac cluster at the National Energy Research Supercomputing Center (NERSC). In addition, we propose a strategy and framework fo- legacy and new applications to successfully leverage the evolving GAMER codebase on massively parallel architectures. The framework and the benchmarks are aimed to help quantify the adoption strategies for legacy and new scientific applications.

Keywords

ANSI standards; energy consumption; graphics processing units; mainframes; parallel architectures; parallel machines; power aware computing; rendering (computer graphics); 50-nodes NVIDIA C2050 GPU; ANSI standard programming models; GAMER codebase; GPU based systems; GPU-aware AMR code; Intel 8-core Nehalem CPU; National Energy Research Supercomputing Center; OpenCL; Poisson-Gravity solvers; SIMD cores; adaptive mesh refinements; advanced hydrodynamics partial differential equation solvers; clock-speeds; energy consumption benchmark metrics; error prone code rewriting; image rendering; many-core graphics processing units; massively parallel compute architectures; multiscience applications; parallel workload handling; power efficient extreme-scale high performance computing; video rendering; Computational modeling; Graphics processing unit; Instruction sets; Kernel; Mathematical model; Memory management; AMR; GPU; Poisson-Gravity solvers; benchmarks; hydrodynamics; simulations;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for

Conference_Location

Seatle, WA

Electronic_ISBN

978-1-4503-0771-0

Type

conf

Filename

6114450

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=560183