• DocumentCode
    3247825
  • Title

    Scalable MPI design over InfiniBand using eXtended Reliable Connection

  • Author

    Koop, Matthew J. ; Sridhar, Jaidev K. ; Panda, Dhabaleswar K.

  • Author_Institution
    Network-Based Comput. Lab., Ohio State Univ., Columbus, OH
  • fYear
    2008
  • fDate
    Sept. 29 2008-Oct. 1 2008
  • Firstpage
    203
  • Lastpage
    212
  • Abstract
    A significant component of a high-performance cluster is the compute node interconnect. InfiniBand, is an interconnect of such systems that is enjoying wide success due to low latency (1.0-3.0 musec) and high bandwidth and other features. The Message Passing Interface (MPI) is the dominant programming model for parallel scientific applications. As a result, the MPI library and interconnect play a significant role in the scalability. These clusters continue to scale to ever-increasing levels making the role very important. As an example, the ldquoRangerrdquo system at the Texas Advanced Computing Center (TACC) includes over 60,000 cores with nearly 4000 InfiniBand ports. Previous work has shown that memory usage simply for connections when using the Reliable Connection (RC) transport of InfiniBand can reach hundreds of megabytes of memory per process at that level. To address these scalability problems a new InfiniBand transport, eXtended Reliable Connection, has been introduced. In this paper we describe XRC and design MPI over this new transport. We describe the variety of design choices that must be made as well as the various optimizations that XRC allows. We implement our designs and evaluate it on an InfiniBand cluster against RC-based designs. The memory scalability in terms of both connection memory and memory efficiency for communication buffers is evaluated for all of the configurations. Connection memory scalability evaluation shows a potential 100 times improvement over a similarly configured RC-based design. Evaluation using NAMD shows a 10% performance improvement for our XRC-based prototype for the jac2000 benchmark.
  • Keywords
    message passing; storage management; workstation clusters; InfiniBand transport; dominant programming model; extended reliable connection; high-performance cluster; memory efficiency; message passing interface; parallel scientific application; scalable MPI library design; Bandwidth; Computer networks; Delay; Laboratories; Large-scale systems; Libraries; Message passing; Parallel programming; Peer to peer computing; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing, 2008 IEEE International Conference on
  • Conference_Location
    Tsukuba
  • ISSN
    1552-5244
  • Print_ISBN
    978-1-4244-2639-3
  • Electronic_ISBN
    1552-5244
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2008.4663773
  • Filename
    4663773