• DocumentCode
    3077205
  • Title

    Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters

  • Author

    Zahid, Feroz ; Gran, Ernst Gunnar ; Bogdanski, Bartosz ; Johnsen, Bjorn Dag ; Skeie, Tor

  • Author_Institution
    Simula Res. Lab., Lysaker, Norway
  • fYear
    2015
  • fDate
    4-7 May 2015
  • Firstpage
    189
  • Lastpage
    198
  • Abstract
    InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, isolation of nodes is provided through partitioning. The routing algorithm, however, is unaware of these partitions in the network, Traffic flows belonging to different partitions might share links inside the network fabric. This sharing of intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments like a cloud. In such systems, each tenant should experience predictable network performance, unaffected by the workload of other tenants. In addition, using current routing schemes, routes crossing partition boundaries are considered when distributing routes onto links in the network, despite the fact that these routes will never be used. The result is degraded load-balancing. In this paper, we present a novel partition-aware fat-tree routing algorithm, pFTree. The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups. Given the available network resources, pFTree starts by isolating partitions at the physical link level, and then moves on to utilize virtual lanes, if needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the de facto standard IB fat-tree routing algorithm.
  • Keywords
    internetworking; resource allocation; telecommunication network routing; telecommunication traffic; tree data structures; IB fabrics; IB fat-tree routing algorithm; InfiniBand-based multitenant clusters; high-performance computing systems; interpartition interference; link level; load-balancing; network fabric; network interconnect; network isolation improvement; network performance; network resources; network-wide partition isolation; node isolation; node partitioning; pFTree algorithm; partition boundaries; partition-aware fat-tree routing algorithm; route distribution; tenant groups; tenant workload; traffic flows; virtual lanes; Bandwidth; Fabrics; Partitioning algorithms; Ports (Computers); Quality of service; Routing; Topology; InfiniBand; interconnection networks; performance isolation; routing; virtual channels;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
  • Conference_Location
    Shenzhen
  • Type

    conf

  • DOI
    10.1109/CCGrid.2015.96
  • Filename
    7152485