• DocumentCode
    1783344
  • Title

    A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU

  • Author

    Tingxing Dong ; Dobrev, Veselin ; Kolev, Tzanio ; Rieben, Robert ; Tomov, Stanimire ; Dongarra, Jack

  • Author_Institution
    Innovative Comput. Lab., Univ. of Tennessee, Knoxville, TN, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    972
  • Lastpage
    981
  • Abstract
    Power and energy consumption are becoming an increasing concern in high performance computing. Compared to multi-core CPUs, GPUs have a much better performance per watt. In this paper we discuss efforts to redesign the most computation intensive parts of BLAST, an application that solves the equations for compressible hydrodynamics with high order finite elements, using GPUs BLAST, Dobrev. In order to exploit the hardware parallelism of GPUs and achieve high performance, we implemented custom linear algebra kernels. We intensively optimized our CUDA kernels by exploiting the memory hierarchy, which exceed the vendor´s library routines substantially in performance. We proposed an auto tuning technique to adapt our CUDA kernels to the orders of the finite element method. Compared to a previous base implementation, our redesign and optimization lowered the energy consumption of the GPU in two aspects: 60% less time to solution and 10% less power required. Compared to the CPU-only solution, our GPU accelerated BLAST obtained a 2.5× overall speedup and 1.42× energy efficiency (green up) using 4th order (Q_4) finite elements, and a 1.9× speedup and 1.27× green up using 2nd order (Q2) finite elements.
  • Keywords
    finite element analysis; graphics processing units; hydrodynamics; linear algebra; mechanical engineering computing; parallel architectures; power aware computing; BLAST; CPU-GPU; CUDA kernels; autotuning technique; compressible hydrodynamics; computation intensive parts; custom linear algebra kernels; energy efficient computing; finite element method; high order finite elements; high performance computing; hydrodynamic application; multicore CPU; vendor library routines; Bandwidth; Force; Graphics processing units; Instruction sets; Kernel; Registers; Sparse matrices; Energy; FEM; GPU; Power; hydrodynamics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.103
  • Filename
    6877327