Title :
Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters
Author :
Zahid, Feroz ; Gran, Ernst Gunnar ; Bogdanski, Bartosz ; Johnsen, Bjorn Dag ; Skeie, Tor
Author_Institution :
Simula Res. Lab., Lysaker, Norway
Abstract :
InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, isolation of nodes is provided through partitioning. The routing algorithm, however, is unaware of these partitions in the network, Traffic flows belonging to different partitions might share links inside the network fabric. This sharing of intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments like a cloud. In such systems, each tenant should experience predictable network performance, unaffected by the workload of other tenants. In addition, using current routing schemes, routes crossing partition boundaries are considered when distributing routes onto links in the network, despite the fact that these routes will never be used. The result is degraded load-balancing. In this paper, we present a novel partition-aware fat-tree routing algorithm, pFTree. The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups. Given the available network resources, pFTree starts by isolating partitions at the physical link level, and then moves on to utilize virtual lanes, if needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the de facto standard IB fat-tree routing algorithm.
Keywords :
internetworking; resource allocation; telecommunication network routing; telecommunication traffic; tree data structures; IB fabrics; IB fat-tree routing algorithm; InfiniBand-based multitenant clusters; high-performance computing systems; interpartition interference; link level; load-balancing; network fabric; network interconnect; network isolation improvement; network performance; network resources; network-wide partition isolation; node isolation; node partitioning; pFTree algorithm; partition boundaries; partition-aware fat-tree routing algorithm; route distribution; tenant groups; tenant workload; traffic flows; virtual lanes; Bandwidth; Fabrics; Partitioning algorithms; Ports (Computers); Quality of service; Routing; Topology; InfiniBand; interconnection networks; performance isolation; routing; virtual channels;
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on
Conference_Location :
Shenzhen
DOI :
10.1109/CCGrid.2015.96