• DocumentCode
    2534369
  • Title

    High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2

  • Author

    Luo, Miao ; Potluri, Sreeram ; Lai, Ping ; Mancini, Emilio P. ; Subramoni, Hari ; Kandalla, Krishna ; Sur, Sayantan ; Panda, Dhabaleswar K.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2010
  • fDate
    13-16 Sept. 2010
  • Firstpage
    377
  • Lastpage
    386
  • Abstract
    High End Computing (HEC) systems are being deployed with eight to sixteen compute cores, with 64 to 128 cores/node being envisioned for exascale systems. MVAPICH2 is a popular implementation of MPI-2 specifically designed and optimized for InfiniBand, iWARP and RDMA over Converged Ethernet (RoCE). MVAPICH2 is based on MPICH2 from ANL. Recently MPICH2 has been redesigned with an effort to optimize intra-node communication for future many-core systems. The new communication layer in MPICH2 is called Nemesis, which is very well optimized for shared memory message passing, with a modular design for various high-performance interconnects. In this paper we explore the challenges involved in designing the next-generation MVAPICH2 stack, leveraging the Nemesis communication layer. We observe that Nemesis does not provide abstractions for one-sided communication. We propose an extended Nemesis interface for optimized one-sided communication and provide design details. Our experimental evaluation shows that our proposed one-sided interface extensions are able to provide significantly better performance than the basic Nemesis interface. For example, inter-node MPI_Put bandwidth increased from 1,800 MB/s to 3,000 MB/s and latency for small messages went down by 13%. Additionally, with our proposed designs, we are able to demonstrate performance gains with small messages, when compared to the existing MVAPICH2 CH3 implementation. The designs proposed in this paper is a superset of currently available options to MVAPICH2 users and provides the best combination of performance and modularity.
  • Keywords
    application program interfaces; local area networks; message passing; HEC systems; InfiniBand; RDMA; converged Ethernet; exascale systems; high end computing system; high performance design; high-performance interconnects; iWARP; intranode communication; many-core systems; nemesis communication layer; next-generation MVAPICH2 stack; one-sided MPI semantics; optimized one-sided communication; shared memory message passing; two-sided MPI semantics; Ethernet networks; Hardware; Open source software; Optimization; Semantics; Sockets; Synchronization; MPICH2; MVAPICH2; RMA;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICPPW), 2010 39th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1530-2016
  • Print_ISBN
    978-1-4244-7918-4
  • Electronic_ISBN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2010.58
  • Filename
    5599096