Title :
Fault tolerance for an embedded wormhole switched network
Author :
Hotchkiss, R. ; O´Neill, B.C. ; Clark, S.
Author_Institution :
Dept. of Electr. & Electron. Eng., Nottingham Trent Univ., UK
Abstract :
The effectiveness of parallel and distributed systems depends heavily upon the reliability and efficiency of the method used for information transfer. To satisfy these requirements, the communication medium must supply fault tolerance throughout the communication layers, but should minimise operational overheads. The work described relates to a scalable communication system for a distributed-memory parallel processing architecture, which is constructed with message routing switches. The system employs a hardware mechanism that is local to each physical connection, which provides a distributed solution for fault detection and isolation. By isolating faults and the use of adaptive routing algorithms, networks may be designed that will maintain operability in the presence of faults. An explanation of the basic switch and fault isolation mechanism is provided. The paper concludes with implementation details of the operational hardware and details of the environment, in which it has been tested
Keywords :
distributed memory systems; fault tolerant computing; message passing; multiprocessor interconnection networks; network routing; parallel architectures; adaptive routing algorithms; distributed systems; distributed-memory parallel processing architecture; embedded wormhole switched network; fault detection; fault tolerance; information transfer; message routing switches; operational overhead; parallel systems; scalable communication system; Adaptive systems; Algorithm design and analysis; Communication switching; Fault detection; Fault tolerance; Hardware; Parallel processing; Routing; Switches; Telecommunication network reliability;
Conference_Titel :
Parallel Computing in Electrical Engineering, 2000. PARELEC 2000. Proceedings. International Conference on
Conference_Location :
Trois-Rivieres, Que.
Print_ISBN :
0-7695-0759-X
DOI :
10.1109/PCEE.2000.873606