• DocumentCode
    560183
  • Title

    Multi-science applications with single codebase — GAMER — For massively parallel architectures

  • Author

    Shukla, Hemant ; Schive, Hsi-Yu ; Woo, Tak-Pong ; Chiueh, Tzihong

  • Author_Institution
    Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
  • fYear
    2011
  • fDate
    12-18 Nov. 2011
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    The growing need for power efficient extreme-scale high-performance computing (HPC) coupled with plateauing clock-speeds is driving the emergence of massively parallel compute architectures. Tens to many hundreds of cores are increasingly made available as compute units, either as the integral part of the main processor or as coprocessors designed for handling massively parallel workloads. In the case of many-core graphics processing units (GPUs) hundreds of SIMD cores primarily designed for image and video rendering are used for high-performance scientific computations. The new architectures typically offer ANSI standard programming models such as CUDA (NVIDIA) and OpenCL. However, the wide-ranging adoption of these parallel architectures is steeped in difficult learning curve and requires reengineering of existing applications that mostly leads to expensive and error prone code rewrites without prior guarantee and knowledge of any speedups. Broad range of complex scientific applications across many domains use common algorithms and techniques, such as adaptive mesh refinements (AMR), advanced hydrodynamics partial differential equation (PDE) solvers, Poisson-Gravity solvers etc, that have demonstrably performed highly efficiently on GPU based systems. Taking advantage of the commonalities, we use GPU-aware AMR code, GAMER [1], to examine the unique approach of solving multi-science problems in astrophysics, hydrodynamics and particle physics with single codebase. We demonstrate significant speedups in disparate class of scientific applications on 3 separate clusters, viz., Dirac, Laohu and Mole 8.5. By extensively reusing the extendable single codebase we mitigate the impediments of significant code rewrites. We also collect performance and energy consumption benchmark metrics on 50-nodes NVIDIA C2050 GPU and Intel 8-core Nehalem CPU on Dirac cluster at the National Energy Research Supercomputing Center (NERSC). In addition, we propose a strategy and framework fo- legacy and new applications to successfully leverage the evolving GAMER codebase on massively parallel architectures. The framework and the benchmarks are aimed to help quantify the adoption strategies for legacy and new scientific applications.
  • Keywords
    ANSI standards; energy consumption; graphics processing units; mainframes; parallel architectures; parallel machines; power aware computing; rendering (computer graphics); 50-nodes NVIDIA C2050 GPU; ANSI standard programming models; GAMER codebase; GPU based systems; GPU-aware AMR code; Intel 8-core Nehalem CPU; National Energy Research Supercomputing Center; OpenCL; Poisson-Gravity solvers; SIMD cores; adaptive mesh refinements; advanced hydrodynamics partial differential equation solvers; clock-speeds; energy consumption benchmark metrics; error prone code rewriting; image rendering; many-core graphics processing units; massively parallel compute architectures; multiscience applications; parallel workload handling; power efficient extreme-scale high performance computing; video rendering; Computational modeling; Graphics processing unit; Instruction sets; Kernel; Mathematical model; Memory management; AMR; GPU; Poisson-Gravity solvers; benchmarks; hydrodynamics; simulations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
  • Conference_Location
    Seatle, WA
  • Electronic_ISBN
    978-1-4503-0771-0
  • Type

    conf

  • Filename
    6114450