• DocumentCode
    1684101
  • Title

    Fault tolerance with shortest paths in regular and irregular networks

  • Author

    Sem-Jacobsen, Frank Olaf ; Lysne, Olav

  • Author_Institution
    Dept. of Inf., Univ. of Oslo Oslo, Oslo
  • fYear
    2008
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    Fault tolerance has become an important part of current supercomputers. Local dynamic fault tolerance is the most expedient way of tolerating faults by preconfiguring the network with multiple paths from every node/switch to every destination. In this paper we present a local shortest path dynamic fault-tolerance mechanism inspired by a solution developed for the Internet, that can be applied to any shortest path routing algorithm such as dimension ordered routing, fat tree routing, layered shortest path, etc., and provide a solution for achieving deadlock freedom in the presence of faults. Simulation results show that 1) for fat trees this yields the to this day highest throughput and lowest requirements on virtual layers for dynamic one-fault tolerance, 2) we require in general few layers to achieve deadlock freedom, and 3) for irregular topologies it gives at most a 10 times performance increase compared to FRoots.
  • Keywords
    software fault tolerance; Internet; deadlock freedom; fault tolerance; shortest paths; Computer networks; Fault tolerance; Fault tolerant systems; Heuristic algorithms; Network topology; Routing; Supercomputers; Switches; System recovery; Telecommunication traffic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
  • Conference_Location
    Miami, FL
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-1693-6
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2008.4536280
  • Filename
    4536280