• Title of article

    Fast discontinuous Galerkin lattice-Boltzmann simulations on GPUs via maximal kernel fusion Original Research Article

  • Author/Authors

    Marco D. Mazzeo، نويسنده ,

  • Issue Information
    ماهنامه با شماره پیاپی سال 2013
  • Pages
    13
  • From page
    537
  • To page
    549
  • Abstract
    A GPU implementation of the discontinuous Galerkin lattice-Boltzmann method with square spectral elements, and highly optimised for speed and precision of calculations is presented. An extensive analysis of the numerous variants of the fluid solver unveils that best performance is obtained by maximising CUDA kernel fusion and by arranging the resulting kernel tasks so as to trigger memory coherent and scattered loads in a specific manner, albeit at the cost of introducing cross-thread load unbalancing. Surprisingly, any attempt to vanish this, to maximise thread occupancy and to adopt conventional work tiling or distinct custom kernels highly tuned via ad hoc data and computation layouts invariably deteriorate performance. As such, this work sheds light into the possibility to hide fetch latencies of workloads involving heterogeneous loads in a way that is more effective than what is achieved with frequently suggested techniques. When simulating the lid-driven cavity on a NVIDIA GeForce GTX 480 via a 5-stage 4th-order Runge–Kutta (RK) scheme, the first four digits of the obtained centreline velocity values, or more, converge to those of the state-of-the-art literature data at a simulation speed of 7.0G primitive variable updates per second during the collision stage and 4.4G ones during each RK step of the advection by employing double-precision arithmetic (DPA) and a computational grid of image image-point elements only. The new programming engine leads to about image performance w.r.t. the best programming guidelines in the field. The new fluid solver on the above GPU is also 20–30 times faster than a highly optimised version running on a single core of a Intel Xeon X5650 2.66 GHz.
  • Keywords
    Spectral elements , GPU computing , Lattice-Boltzmann method , Discontinuous-Galerkin method , CUDA , Mesh-based methods
  • Journal title
    Computer Physics Communications
  • Serial Year
    2013
  • Journal title
    Computer Physics Communications
  • Record number

    1136475