Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?

Author

Kandalla, K. ; Buluç, A. ; Subramoni, H. ; Tomko, K. ; Vienne, J. ; Oliker, L. ; Panda, D.K.

Author_Institution

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear

2012

fDate

24-28 Sept. 2012

Firstpage

222

Lastpage

230

Abstract

Graph-based computations are commonly used across various data intensive computing domains ranging from social networks to biological systems. On distributed memory systems, graph algorithms involve explicit communication between processes and often exhibit sparse, irregular behavior. Minimizing these communication overheads is critical to cater to the graph-theoretic analyses demands of emerging âbigdataâ applications. In this paper, we explore the challenges associated with reducing the communication overheads of a popular 2D Breadth First Search (BFS) implementation in the CombBLAS library. This BFS algorithm relies on two common MPI collectives, MPI Alltoallv and MPI Allgathervto exchange data between processes and they account for more than 20% of the overall run time. Re-designing parallel applications to take advantage of MPI-3 non-blocking collectives to achieve latency hiding is an active area of research. However, the 2D BFS algorithm in CombBLAS is not directly amenable for such a re-design through common overlap techniques such as, double-buffering. In this paper, we propose to re-design the BFS algorithm to leverage MPI-3 no blocking, neighborhood collective communication operations to achieve fine-grained computation/communication overlap. We also leverage the CORE-Direct network offload feature in theConnectX-2 InfiniBand adapter from Mellanox to design highly efficient and scalable non-blocking, neighborhood Alltoallv and Allgatherv collective operations. Our experimental evaluations show that we can improve the communication overheads of the2D BFS algorithm by up to 70%, with 1,936 processes.

Keywords

application program interfaces; distributed memory systems; graph theory; libraries; message passing; parallel processing; tree searching; 2D BFS algorithm; 2D breadth first search implementation; BFS implementation; CORE-direct network offload feature; CombBLAS library; ConnectX-2 InfiniBand adapter; MPI Allgatherv; MPI Alltoallv; Mellanox; big data applications; communication overhead minimization; distributed memory systems; double-buffering; graph-based computations; graph-theoretic analysis; intensive computing domains; irregular graph algorithms; latency hiding; message passing interface; neighborhood collective communication operations; network-offload based nonblocking neighborhood MPI collectives; parallel applications; Algorithm design and analysis; Benchmark testing; Clustering algorithms; Libraries; Sorting; Topology; Vectors; 2D BFS Algorithms; Collective Offload and InfiniBand; MPI-3 non-blocking collectives;

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4673-2893-7

Type

conf

DOI

10.1109/ClusterW.2012.40

Filename

6355868

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=576926