DocumentCode :
2440491
Title :
Optimization of applications with non-blocking neighborhood collectives via multisends on the Blue Gene/P supercomputer
Author :
Kumar, Sameer ; Heidelberger, Philip ; Chen, Dong ; Hines, Michael
Author_Institution :
T J. Watson Res. Center, IBM, Yorktown Heights, NY, USA
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
11
Abstract :
We explore the multisend interface as a data mover interface to optimize applications with neighborhood collective communication operations. One of the limitations of the current MPI 2.1 standard is that the vector collective calls require counts and displacements (zero and non-zero bytes) to be specified for all the processors in the communicator. Further, all the collective calls in MPI 2.1 are blocking and do not permit overlap of communication with computation in the same thread of execution. However, multisends are non-blocking calls that permit overlap of computation and communication. We present the record replay persistent optimization to the multisend interface th at minimizes the processor overhead of initiating the collective. We present four different case studies with the multisend API on Blue Gene/P (i) 3D-FFT, (ii) 4D nearest neighbor exchange as used in Quantum Chromodynamics, (iii) NAMD and (iv) neural network simulator NEURON. Performance results show 1.9× speedup with 323 3D-FFTs, 1.9× speedup for 4D nearest neighbor exchange with the 24 problem, 1.6× speedup in NAMD and almost 3× speedup in NEURON with 256K cells and 1k connections/cell.
Keywords :
application program interfaces; fast Fourier transforms; message passing; multiprocessing systems; neural nets; quantum chromodynamics; 3D-FFT; 4D nearest neighbor exchange; Blue Gene/P supercomputer; MPI 2.1 standard; NAMD; NEURON; application optimization; data mover interface; multisend API; multisend interface; neighborhood collective communication operation; neural network simulator; nonblocking neighborhood collectives; processor overhead minimization; quantum chromodynamics; record replay persistent optimization; vector collective calls; Application software; Buffer storage; Communication standards; Computer science; Libraries; Nearest neighbor searches; Neural networks; Neurons; Proposals; Supercomputers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
ISSN :
1530-2075
Print_ISBN :
978-1-4244-6442-5
Type :
conf
DOI :
10.1109/IPDPS.2010.5470407
Filename :
5470407
Link To Document :
بازگشت