• DocumentCode
    1056176
  • Title

    Distributed, deadlock-free routing in faulty, pipelined, direct interconnection networks

  • Author

    Gaughan, Patrick T. ; Dao, Binh V. ; Yalamanchili, Sudhakar ; Schimmel, David E.

  • Author_Institution
    Dept. of Electr. Eng., Alabama Univ., Tuscaloosa, AL, USA
  • Volume
    45
  • Issue
    6
  • fYear
    1996
  • fDate
    6/1/1996 12:00:00 AM
  • Firstpage
    651
  • Lastpage
    665
  • Abstract
    This paper focuses on designing high performance pipelined networks that can operate in the presence of dynamic component failures. A general, rigorous framework for deadlock-free communication in faulty, pipelined networks is developed. A mechanism is also proposed for recovering from dynamic link and node failures. The recovery mechanism (1) is fully distributed, (2) does not require timeouts, (3) prevents fault-induced deadlock, and (4) is integrated into the virtual channel flow control mechanisms. This recovery mechanism is used to develop a new pipelined communication mechanism-acknowledged pipelined circuit-switching (APCS). This mechanism supports existing routing protocols that can tolerate a maximal number of static link failures, i.e., one less than the number of ports on a node. An implementation of a novel router architecture is described and the results of detailed flit level simulations are presented. Finally, the proposed recovery mechanism is shown to be applicable to existing adaptive wormhole routing protocols which are prone to deadlock in the presence of dynamic faults
  • Keywords
    fault tolerant computing; multiprocessor interconnection networks; network routing; protocols; acknowledged pipelined circuit-switching; adaptive wormhole routing protocols; deadlock-free communication; deadlock-free routing; direct interconnection networks; distributed routing; dynamic component failures; dynamic link; fault-induced deadlock; high performance pipelined networks; node failures; pipelined communication mechanism; recovery mechanism; router architecture; routing protocols; static link failures; virtual channel flow control mechanisms; Circuit faults; Fault tolerance; Intelligent networks; Laboratories; Multiprocessor interconnection networks; Personal communication networks; Routing protocols; Senior members; Student members; System recovery;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.506422
  • Filename
    506422