Title of article :
Fast discontinuous Galerkin lattice-Boltzmann simulations on GPUs via maximal kernel fusion Original Research Article
Author/Authors :
Marco D. Mazzeo، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2013
Pages :
13
From page :
537
To page :
549
Abstract :
A GPU implementation of the discontinuous Galerkin lattice-Boltzmann method with square spectral elements, and highly optimised for speed and precision of calculations is presented. An extensive analysis of the numerous variants of the fluid solver unveils that best performance is obtained by maximising CUDA kernel fusion and by arranging the resulting kernel tasks so as to trigger memory coherent and scattered loads in a specific manner, albeit at the cost of introducing cross-thread load unbalancing. Surprisingly, any attempt to vanish this, to maximise thread occupancy and to adopt conventional work tiling or distinct custom kernels highly tuned via ad hoc data and computation layouts invariably deteriorate performance. As such, this work sheds light into the possibility to hide fetch latencies of workloads involving heterogeneous loads in a way that is more effective than what is achieved with frequently suggested techniques. When simulating the lid-driven cavity on a NVIDIA GeForce GTX 480 via a 5-stage 4th-order Runge–Kutta (RK) scheme, the first four digits of the obtained centreline velocity values, or more, converge to those of the state-of-the-art literature data at a simulation speed of 7.0G primitive variable updates per second during the collision stage and 4.4G ones during each RK step of the advection by employing double-precision arithmetic (DPA) and a computational grid of image image-point elements only. The new programming engine leads to about image performance w.r.t. the best programming guidelines in the field. The new fluid solver on the above GPU is also 20–30 times faster than a highly optimised version running on a single core of a Intel Xeon X5650 2.66 GHz.
Keywords :
Spectral elements , GPU computing , Lattice-Boltzmann method , Discontinuous-Galerkin method , CUDA , Mesh-based methods
Journal title :
Computer Physics Communications
Serial Year :
2013
Journal title :
Computer Physics Communications
Record number :
1136475
Link To Document :
بازگشت