DocumentCode
2502908
Title
Dynamic Fault Tolerance with Misrouting in Fat Trees
Author
Sem-Jacobsen, Frank Olaf ; Skeie, Tor ; Lysne, Olav ; Duato, José
Author_Institution
Dept. of Informatics, Oslo Univ.
fYear
2006
fDate
14-18 Aug. 2006
Firstpage
33
Lastpage
44
Abstract
Fault tolerance is critical for efficient utilisation of large computer systems. Dynamic fault tolerance allows the network to remain available through the occurance of faults as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Although dynamic fault tolerance may lead to less efficient solutions than static fault tolerance, it allows for a much higher availability of the system. In this paper we devise a dynamic fault tolerant adaptive routing algorithm for the fat tree, a much used interconnect topology, which relies on misrouting around link faults. We show that we are guaranteed to tolerate any combination of less than (num_switch_ports)/2 link faults without the need for additional network resources for deadlock freedom. There is also a high probability of tolerating an even larger number of link faults. Simulation results show that network performance degrades very little when faults are dynamically tolerated
Keywords
fault tolerant computing; multiprocessor interconnection networks; telecommunication network routing; trees (mathematics); dynamic fault tolerance; dynamic fault tolerant adaptive routing; fat tree; interconnect topology; link fault misrouting; network performance; Circuit faults; Computer networks; Distributed computing; Fault tolerance; Fault tolerant systems; Integrated circuit interconnections; Network topology; Routing; Switches; Switching circuits;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 2006. ICPP 2006. International Conference on
Conference_Location
Columbus, OH
ISSN
0190-3918
Print_ISBN
0-7695-2636-5
Type
conf
DOI
10.1109/ICPP.2006.36
Filename
1690603
Link To Document