مرکز منطقه ای اطلاع رساني علوم و فناوري - Designing an Offloaded Nonblocking MPI

DocumentCode :

1925605

Title :

Designing an Offloaded Nonblocking MPI_Allgather Collective Using CORE-Direct

Author :

Inozemtsev, Grigori ; Afsahi, Ahmad

Author_Institution :

Dept. of Electr. & Comput. Eng., Queen´´s Univ., Kingston, ON, Canada

fYear :

2012

fDate :

24-28 Sept. 2012

Firstpage :

477

Lastpage :

485

Abstract :

Collective communication operations in the Message Passing Interface (MPI) consume a significant amount of time at scale, degrading the performance of scientific applications. Optimizing collectives is key to application performance and scalability. This paper focuses on hiding the latency of the allgather collective by efficiently offloading it to the networking hardware. We have investigated the use of Mellanox CORE-Direct offloading technology for independent progression of communication within the collective in order to achieve high communication/computation overlap. This study evaluates several design options for the nonblocking allgather collective and discusses implementations of offloaded Standard Exchange, Ring and Bruck algorithms in flat and hierarchical communicators under single-port and k-port modelling. We have applied our findings to improving the performance of the redesigned Radix Sort application kernel. Performance results suggest that our offloaded nonblocking all gather compares favourably to the blocking variant (with improvements of up to 68% for medium messages in a hierarchical collective) while providing high overlap capability. Multiport modelling is shown to be beneficial, especially in a flat communicator. Radix Sort enjoys up to 40% improvement in its runtime.

Keywords :

message passing; Bruck algorithm; Mellanox CORE-Direct offloading technology; Ring algorithm; collective communication operation; hierarchical collective; k-port modelling; message passing interface; multiport modelling; networking hardware; offloaded nonblocking MPI_allgather collective; offloaded standard exchange; radix sort application kernel; scientific application; single-port modelling; Algorithm design and analysis; Context; Kernel; Message systems; Protocols; Standards; MPI; allgather; collective communication; coredirect; message passing; offloading;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster Computing (CLUSTER), 2012 IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4673-2422-9

Type :

conf

DOI :

10.1109/CLUSTER.2012.75

Filename :

6337811

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1925605