Title :
Effect of faulty network components on node availability
Author_Institution :
Scalable Syst. Div., Intel Corp., Hillsboro, OR, USA
Abstract :
Failure of network hardware components in parallel systems results in deterioration of the communication support provided to various nodes. Limited communication capability may result in the unavailability of one or more nodes to users. In this paper, we develop a framework to study the effect of faulty network components on node availability. We use the framework to investigate the specific case of wormhole routed mesh networks under random faults, and evaluate availability for various routing schemes. Results show that availability is poor in current networks, but can be significantly improved by combining simple existing techniques, and a minimal or no change in routing hardware complexity
Keywords :
fault tolerant computing; hypercube networks; performance evaluation; communication support; faulty network components; network hardware components; node availability; parallel systems; routing hardware complexity; wormhole routed mesh networks; Application software; Availability; Bidirectional control; Fault tolerance; Hardware; Mesh networks; Network interfaces; Power system modeling; Supercomputers; Wires;
Conference_Titel :
Computers and Communications, 1995., Conference Proceedings of the 1995 IEEE Fourteenth Annual International Phoenix Conference on
Conference_Location :
Scottsdale, AZ
Print_ISBN :
0-7803-2492-7
DOI :
10.1109/PCCC.1995.472500