DocumentCode
1684101
Title
Fault tolerance with shortest paths in regular and irregular networks
Author
Sem-Jacobsen, Frank Olaf ; Lysne, Olav
Author_Institution
Dept. of Inf., Univ. of Oslo Oslo, Oslo
fYear
2008
Firstpage
1
Lastpage
11
Abstract
Fault tolerance has become an important part of current supercomputers. Local dynamic fault tolerance is the most expedient way of tolerating faults by preconfiguring the network with multiple paths from every node/switch to every destination. In this paper we present a local shortest path dynamic fault-tolerance mechanism inspired by a solution developed for the Internet, that can be applied to any shortest path routing algorithm such as dimension ordered routing, fat tree routing, layered shortest path, etc., and provide a solution for achieving deadlock freedom in the presence of faults. Simulation results show that 1) for fat trees this yields the to this day highest throughput and lowest requirements on virtual layers for dynamic one-fault tolerance, 2) we require in general few layers to achieve deadlock freedom, and 3) for irregular topologies it gives at most a 10 times performance increase compared to FRoots.
Keywords
software fault tolerance; Internet; deadlock freedom; fault tolerance; shortest paths; Computer networks; Fault tolerance; Fault tolerant systems; Heuristic algorithms; Network topology; Routing; Supercomputers; Switches; System recovery; Telecommunication traffic;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location
Miami, FL
ISSN
1530-2075
Print_ISBN
978-1-4244-1693-6
Electronic_ISBN
1530-2075
Type
conf
DOI
10.1109/IPDPS.2008.4536280
Filename
4536280
Link To Document