DocumentCode :
1557021
Title :
A Dynamically Adjusting Gracefully Degrading Link-Level Fault-Tolerant Mechanism for NoCs
Author :
Vitkovskiy, Arseniy ; Soteriou, Vassos ; Nicopoulos, Chrysostomos
Author_Institution :
Dept. of Electr. & Comput. Eng. & Inf., Cyprus Univ. of Technol., Lemesos, Cyprus
Volume :
31
Issue :
8
fYear :
2012
Firstpage :
1235
Lastpage :
1248
Abstract :
The rapid scaling of silicon technology has enabled massive transistor integration densities. Nanometer feature sizes, however, are marred by increasing variability and susceptibility to wear-out. Billion-transistor designs, such as chip multiprocessors (CMPs), are especially vulnerable to defects. CMPs rely on a network-on-chip for all their communication needs. A single link failure within this on-chip fabric can impede, halt, or even deadlock, intertile communication, which can render the entire chip multiprocessor useless. In this paper, we present a technique capable of handling very large numbers of permanent wire failures that occur in parallel links either at manufacture-time or at runtime (dynamically). As opposed to marking an entire parallel link as faulty, whenever some wires fail, the proposed methodology employs these partially-faulty links (PFLs) to continue the transfer of information-albeit at a gracefully degraded mode-in order to maintain network connectivity. Furthermore, the presented technique can designate PFLs as fully-faulty when several wires fail, by utilizing appropriate routing algorithms that bypass nonoperational links, while still maintaining load-balance in the vicinity of PFLs. The proposed scheme employs architectural support within the on-chip routers to detect link failures and enable reconfiguration at the granularity of individual wires. Hardware synthesis confirms the low-cost nature of the proposed architecture, and full-system simulations using both synthetic network traffic and real workloads demonstrate its efficacy.
Keywords :
fault tolerance; microprocessor chips; network routing; network-on-chip; silicon; wires; CMP; NoC; PFL; billion-transistor designs; bypass nonoperational links; chip multiprocessors; dynamically adjusting gracefully degrading link-level fault-tolerant mechanism; full-system simulations; hardware synthesis; individual wire granularity; intertile communication; link failure detection; load-balance; massive transistor integration densities; nanometer feature sizes; network connectivity; network-on-chip; on-chip fabric; on-chip routers; parallel links; partially-faulty links; permanent wire failures; routing algorithms; silicon technology; single link failure; synthetic network traffic; Algorithm design and analysis; Circuit faults; Clocks; Hardware; Routing; Vectors; Wires; Fault-tolerance; networks-on-chip (NoCs); on-chip interconnection networks; router microarchitecture; routing algorithm;
fLanguage :
English
Journal_Title :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
0278-0070
Type :
jour
DOI :
10.1109/TCAD.2012.2188801
Filename :
6238398
Link To Document :
بازگشت