DocumentCode :
1056176
Title :
Distributed, deadlock-free routing in faulty, pipelined, direct interconnection networks
Author :
Gaughan, Patrick T. ; Dao, Binh V. ; Yalamanchili, Sudhakar ; Schimmel, David E.
Author_Institution :
Dept. of Electr. Eng., Alabama Univ., Tuscaloosa, AL, USA
Volume :
45
Issue :
6
fYear :
1996
fDate :
6/1/1996 12:00:00 AM
Firstpage :
651
Lastpage :
665
Abstract :
This paper focuses on designing high performance pipelined networks that can operate in the presence of dynamic component failures. A general, rigorous framework for deadlock-free communication in faulty, pipelined networks is developed. A mechanism is also proposed for recovering from dynamic link and node failures. The recovery mechanism (1) is fully distributed, (2) does not require timeouts, (3) prevents fault-induced deadlock, and (4) is integrated into the virtual channel flow control mechanisms. This recovery mechanism is used to develop a new pipelined communication mechanism-acknowledged pipelined circuit-switching (APCS). This mechanism supports existing routing protocols that can tolerate a maximal number of static link failures, i.e., one less than the number of ports on a node. An implementation of a novel router architecture is described and the results of detailed flit level simulations are presented. Finally, the proposed recovery mechanism is shown to be applicable to existing adaptive wormhole routing protocols which are prone to deadlock in the presence of dynamic faults
Keywords :
fault tolerant computing; multiprocessor interconnection networks; network routing; protocols; acknowledged pipelined circuit-switching; adaptive wormhole routing protocols; deadlock-free communication; deadlock-free routing; direct interconnection networks; distributed routing; dynamic component failures; dynamic link; fault-induced deadlock; high performance pipelined networks; node failures; pipelined communication mechanism; recovery mechanism; router architecture; routing protocols; static link failures; virtual channel flow control mechanisms; Circuit faults; Fault tolerance; Intelligent networks; Laboratories; Multiprocessor interconnection networks; Personal communication networks; Routing protocols; Senior members; Student members; System recovery;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.506422
Filename :
506422
Link To Document :
بازگشت