DocumentCode :
125514
Title :
Multi-homed Fat-Tree Routing with InfiniBand
Author :
Bogdanski, Bartosz ; Johnsen, Bjorn Dag ; Reinemo, Sven-Arne
Author_Institution :
Oracle Corp., Oslo, Norway
fYear :
2014
fDate :
12-14 Feb. 2014
Firstpage :
122
Lastpage :
129
Abstract :
For clusters where the topology consists of a fat-tree or more than one fat-tree combined into one subnet, there are several properties that the routing algorithms should support, beyond what exists today. One of the missing properties is that current fat-tree routing algorithm does not guarantee that each port on a multi-homed node is routed through redundant spines, even if these ports are connected to redundant leaves. As a consequence, in case of a spine failure, there is a small window where the node is unreachable until the subnet manager has rerouted to another spine. In this paper, we discuss the need for independent routes for multi-homed nodes in fat-trees by providing real-life examples when a single point of failure leads to complete outage of a multi-port node. We present and implement methods that may be used to alleviate this problem and perform simulations that demonstrate improvements in performance, scalability, availability and predictability of InfiniBand fat-tree topologies. We show that our methods not only increase the performance by up to 52.6%, but also, and more importantly, that there is no downtime associated with spine switch failure.
Keywords :
computer network performance evaluation; system recovery; telecommunication network routing; telecommunication network topology; telecommunication switching; trees (mathematics); workstation clusters; InfiniBand; availability improvement; multihomed fat-tree routing algorithm; multihomed node; multiport node; performance improvement; predictability improvement; redundant leaves; redundant spines; scalability improvement; spine switch failure; Algorithm design and analysis; Fabrics; Ports (Computers); Proposals; Routing; System recovery; Topology; fat-tree; mFtree; multihoming; routing; two-port nodes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on
Conference_Location :
Torino
ISSN :
1066-6192
Type :
conf
DOI :
10.1109/PDP.2014.22
Filename :
6787262
Link To Document :
بازگشت