DocumentCode :
1783363
Title :
A Framework for Lattice QCD Calculations on GPUs
Author :
Winter, F.T. ; Clark, M.A. ; Edwards, R.G. ; Joo, Balint
Author_Institution :
Thomas Jefferson Nat. Accel. Facility, Newport News, VA, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
1073
Lastpage :
1082
Abstract :
Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl´s law issue. The Lattice QCD application Chroma allows us to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory. Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one effectively ports the whole application layer in one swing. The QDP-JIT/PTX library, our reimplementation of the low-level layer, provides a framework for Lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler which translates an assembly language (PTX) to GPU code. The existing expression templates enabled us to employ compile-time computations in order to build code generators and to automate the memory management for CUDA. Our implementation has allowed us to deploy the full Chroma gauge-generation program on large scale GPU-based machines such as Titan and Blue Waters and accelerate the calculation by more than an order of magnitude.
Keywords :
graphics processing units; parallel architectures; quantum chromodynamics; software architecture; Blue Waters; CUDA C-C++; GPU programming frameworks; JIT compiler; QDP-JIT-PTX library; Titan; accelerators; application layer; assembly language; code generators; compile-time computations; computational power; full Chroma gauge-generation program; large scale machines; lattice QCD calculations; lattice field theory; low-level programming; memory management; parallel computers; porting strategy; software architecture; software interface; stencil-like operations; time-dominant algorithms; Computer architecture; Generators; Graphics processing units; Indexes; Kernel; Lattices; Libraries; Application framework; C++; CUDA; GPU; JIT; Lattice QCD; PTX;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.112
Filename :
6877336
Link To Document :
بازگشت