• DocumentCode
    2547084
  • Title

    Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

  • Author

    Liu, Jiuxing ; Vishnu, Abhinav ; Panda, Dhabaleswar K.

  • Author_Institution
    Ohio State University
  • fYear
    2004
  • fDate
    06-12 Nov. 2004
  • Firstpage
    33
  • Lastpage
    33
  • Abstract
    In the area of cluster computing, InfiniBand is becoming increasingly popular due to its open standard and high performance. However, even with InfiniBand, network bandwidth can still become the performance bottleneck for some of today’s most demanding applications. In this paper, we study the problem of how to overcome the bandwidth bottleneck by using multirail networks. We present different ways of setting up multirail networks with InfiniBand and propose a unified MPI design that can support all these approaches. We have also discussed various important design issues and provided in-depth discussions of different policies of using multirail networks, including an adaptive striping scheme that can dynamically change the striping parameters based on current system condition. We have implemented our design and evaluated it using both microbenchmarks and applications. Our performance results show that multirail networks can significant improve MPI communication performance. With a two rail InfiniBand cluster, we have achieved almost twice the bandwidth and half the latency for large messages compared with the original MPI. At the application level, the multirail MPI can significantly reduce communication time as well as running time depending on the communication pattern. We have also shown that the adaptive striping scheme can achieve excellent performance without a priori knowledge of the bandwidth of each rail.
  • Keywords
    Bandwidth; Buildings; Communication switching; Delay; Fabrics; Protocols; Rails; Read-write memory; Round robin; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Supercomputing, 2004. Proceedings of the ACM/IEEE SC2004 Conference
  • Print_ISBN
    0-7695-2153-3
  • Type

    conf

  • DOI
    10.1109/SC.2004.15
  • Filename
    1392963