Title :
Tolerating faults in a mesh with a row of spare nodes
Author :
Bruck, Jehoshua ; Cypher, Robert ; Ho, Ching-Tien
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
Abstract :
The authors present an efficient method for tolerating faults in a two-dimensional mesh architecture. The approach is based on adding spare components (nodes) and extra links (edges) such that the resulting architecture can be reconfigured as a mesh in the presence of faults. The cost of the fault-tolerant mesh architecture is optimized by adding about one row of redundant nodes in addition to a set of k spare nodes (while tolerating up to k node faults) and minimizing the number of links per node. The results are surprisingly efficient and seem to be practical for small values of k. The degree of the fault-tolerant architecture is k+5 for odd k , and k+6 for even k. The results can be generalized to d-dimensional meshes such that the number of spare nodes is less than the length of the shortest axis plus k, and the degree of the fault-tolerant mesh is (d-1) k+d+3 when k is odd and (d-1)k +2d+2 when k is even
Keywords :
fault tolerant computing; parallel architectures; fault tolerance; fault-tolerant architecture; fault-tolerant mesh; spare components; two-dimensional mesh architecture; Computer architecture; Cost function; Fabrication; Fault tolerance; Joining processes; Large scale integration; Microprocessors; Parallel machines; Redundancy; Switches;
Conference_Titel :
Parallel and Distributed Processing, 1992. Proceedings of the Fourth IEEE Symposium on
Conference_Location :
Arlington, TX
Print_ISBN :
0-8186-3200-3
DOI :
10.1109/SPDP.1992.242768