• DocumentCode
    1499337
  • Title

    A Reliable Routing Architecture and Algorithm for NoCs

  • Author

    DeOrio, Andrew ; Fick, David ; Bertacco, Valeria ; Sylvester, Dennis ; Blaauw, David ; Hu, Jin ; Chen, Gregory

  • Author_Institution
    Univ. of Michigan, Ann Arbor, MI, USA
  • Volume
    31
  • Issue
    5
  • fYear
    2012
  • fDate
    5/1/2012 12:00:00 AM
  • Firstpage
    726
  • Lastpage
    739
  • Abstract
    Aggressive transistor scaling continues to drive increasingly complex digital designs. The large number of transistors available today enables the development of chip multiprocessors that include many cores on one die communicating through an on-chip interconnect. As the number of cores increases, scalable communication platforms, such as networks-on-chip (NoCs), have become more popular. However, as the sole communication medium, these interconnects are a single point of failure so that any permanent fault in the NoC can cause the entire system to fail. Compounding the problem, transistors have become increasingly susceptible to wear-out related failures as their critical dimensions shrink. As a result, the on-chip network has become a critically exposed unit that must be protected. To this end, we present Vicis, a fault-tolerant architecture and companion routing protocol that is robust to a large number of permanent failures, allowing communication to continue in the face of permanent transistor failures. Vicis makes use of a two-level approach. First, it attempts to work around errors within a router by leveraging reconfigurable architectural components. Second, when faults within a router disable a link´s connectivity, or even an entire router, Vicis reroutes around the faulty node or link with a novel, distributed routing algorithm for meshes and tori. Tolerating permanent faults in both the router components and the reliability hardware itself, Vicis enables graceful performance degradation of networks-on-chip.
  • Keywords
    fault tolerance; integrated circuit metallisation; network routing; network-on-chip; routing protocols; NoC; complex digital designs; distributed routing algorithm; fault-tolerant architecture; networks-on-chip; on-chip interconnect; permanent transistor failures; reliability; reliable routing architecture; routing protocol; scalable communication platforms; sole communication medium; Built-in self-test; Circuit faults; Error correction codes; Registers; Reliability; Routing; Fault tolerance; hard faults; networks-on-chip (NoCs); reconfiguration; reliability; routing algorithms;
  • fLanguage
    English
  • Journal_Title
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0278-0070
  • Type

    jour

  • DOI
    10.1109/TCAD.2011.2181509
  • Filename
    6186857