DocumentCode :
625629
Title :
Acceleration of an Asynchronous Message Driven Programming Paradigm on IBM Blue Gene/Q
Author :
Kumar, Sudhakar ; Yanhua Sun ; Kale, Laxmikant V.
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
689
Lastpage :
699
Abstract :
IBM Blue Gene/Q is the next generation Blue Gene machine that can scale to tens of Peta Flops with 16 cores and 64 hardware threads per node. However, significant efforts are required to fully exploit its capacity on various applications, spanning multiple programming models. In this paper, we focus on the asynchronous message driven parallel programming model - Charm++. Since its behavior (asynchronous) is substantially different from MPI, that presents a challenge in porting it efficiently to BG/Q. On the other hand, the significant synergy between BG/Q software and Charm++ creates opportunities for effective utilization of BG/Q resources. We describe various novel fine-grained threading techniques in Charm++ to exploit the hardware features of the BG/Q compute chip. These include the use of L2 atomics to implement lockless producer-consumer queues to accelerate communication between threads, fast memory allocators, hardware communication threads that are awakened via low overhead interrupts from the BG/Q wakeup unit. Burst of short messages is processed by using the ManytoMany interface to reduce runtime overhead. We also present techniques to optimize NAMD computation via Quad Processing Unit (QPX) vector instructions and the acceleration of message rate via communication threads to optimize the Particle Mesh Ewald (PME) computation. We demonstrate the benefits of our techniques via two benchmarks, 3D Fast Fourier Transform, and the molecular dynamics application NAMD. For the 92,000-atom ApoA1 molecule, we achieved 683μs/step with PME every 4 steps and 782μs/step with PME every step.
Keywords :
IBM computers; fast Fourier transforms; molecular dynamics method; multiprocessing systems; parallel machines; parallel programming; 3D fast Fourier transform; ApoA1 molecule; BG/Q compute chip; BG/Q resource utilization; BG/Q software; BG/Q wakeup unit; Blue Gene machine; Charm++; IBM Blue Gene/Q; L2 atomics; ManytoMany interface; NAMD computation; PME computation; QPX vector instructions; asynchronous message driven parallel programming model; fine-grained threading techniques; hardware communication threads; lockless producer-consumer queues; memory allocators; molecular dynamics application; multiple programming models; overhead interrupts; particle mesh Ewald; quad processing unit; runtime overhead reduction; Acceleration; Hardware; Libraries; Message systems; Peer-to-peer computing; Radiation detectors; Runtime; Blue Gene/Q; Charm++; L2Atomic Queue; communication thread; many to many; message-driven;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
ISSN :
1530-2075
Print_ISBN :
978-1-4673-6066-1
Type :
conf
DOI :
10.1109/IPDPS.2013.83
Filename :
6569854
Link To Document :
بازگشت