DocumentCode :
1680572
Title :
Making an SCI fabric dynamically fault tolerant
Author :
Stensland, Håkon Kvale ; Lysne, Olav ; Nordstrom, R. ; Kohmann, Hugo
Author_Institution :
Simula Res. Lab., Lysaker
fYear :
2008
Firstpage :
1
Lastpage :
8
Abstract :
In this paper we present a method for dynamic fault tolerant routing for SCI networks implemented on Dolphin Interconnect Solutions hardware. By dynamic fault tolerance, we mean that the interconnection network reroutes affected packets around a fault, while the rest of the network is fully functional. To the best of our knowledge this is the first reported case of dynamic fault tolerant routing available on commercial off the shelf interconnection network technology without duplicating hardware resources. The development is focused around a 2-D torus topology, and is compatible with the existing hardware, and software stack. We look into the existing mechanisms for routing in SCI. We describe how to make the nodes that detect the faulty component do routing decisions, and what changes are needed in the existing routing to enable support for local rerouting. The new routing algorithm is tested on clusters with real hardware. Our tests show that distributed databases like MySQL can run uninterruptedly while the network reacts to faults. The solution is now part of Dolphin Interconnect Solutions SCI driver, and hardware development to further decrease the reaction time is underway.
Keywords :
software fault tolerance; software packages; 2D torus topology; Dolphin Interconnect Solutions hardware; SCI networks; distributed databases; fault tolerant routing; interconnection network; Clustering algorithms; Dolphins; Fabrics; Fault detection; Fault tolerance; Hardware; Multiprocessor interconnection networks; Network topology; Routing; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location :
Miami, FL
ISSN :
1530-2075
Print_ISBN :
978-1-4244-1693-6
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2008.4536137
Filename :
4536137
Link To Document :
بازگشت