DocumentCode :
1925605
Title :
Designing an Offloaded Nonblocking MPI_Allgather Collective Using CORE-Direct
Author :
Inozemtsev, Grigori ; Afsahi, Ahmad
Author_Institution :
Dept. of Electr. & Comput. Eng., Queen´´s Univ., Kingston, ON, Canada
fYear :
2012
fDate :
24-28 Sept. 2012
Firstpage :
477
Lastpage :
485
Abstract :
Collective communication operations in the Message Passing Interface (MPI) consume a significant amount of time at scale, degrading the performance of scientific applications. Optimizing collectives is key to application performance and scalability. This paper focuses on hiding the latency of the allgather collective by efficiently offloading it to the networking hardware. We have investigated the use of Mellanox CORE-Direct offloading technology for independent progression of communication within the collective in order to achieve high communication/computation overlap. This study evaluates several design options for the nonblocking allgather collective and discusses implementations of offloaded Standard Exchange, Ring and Bruck algorithms in flat and hierarchical communicators under single-port and k-port modelling. We have applied our findings to improving the performance of the redesigned Radix Sort application kernel. Performance results suggest that our offloaded nonblocking all gather compares favourably to the blocking variant (with improvements of up to 68% for medium messages in a hierarchical collective) while providing high overlap capability. Multiport modelling is shown to be beneficial, especially in a flat communicator. Radix Sort enjoys up to 40% improvement in its runtime.
Keywords :
message passing; Bruck algorithm; Mellanox CORE-Direct offloading technology; Ring algorithm; collective communication operation; hierarchical collective; k-port modelling; message passing interface; multiport modelling; networking hardware; offloaded nonblocking MPI_allgather collective; offloaded standard exchange; radix sort application kernel; scientific application; single-port modelling; Algorithm design and analysis; Context; Kernel; Message systems; Protocols; Standards; MPI; allgather; collective communication; coredirect; message passing; offloading;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2012 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2422-9
Type :
conf
DOI :
10.1109/CLUSTER.2012.75
Filename :
6337811
Link To Document :
بازگشت