DocumentCode
1557021
Title
A Dynamically Adjusting Gracefully Degrading Link-Level Fault-Tolerant Mechanism for NoCs
Author
Vitkovskiy, Arseniy ; Soteriou, Vassos ; Nicopoulos, Chrysostomos
Author_Institution
Dept. of Electr. & Comput. Eng. & Inf., Cyprus Univ. of Technol., Lemesos, Cyprus
Volume
31
Issue
8
fYear
2012
Firstpage
1235
Lastpage
1248
Abstract
The rapid scaling of silicon technology has enabled massive transistor integration densities. Nanometer feature sizes, however, are marred by increasing variability and susceptibility to wear-out. Billion-transistor designs, such as chip multiprocessors (CMPs), are especially vulnerable to defects. CMPs rely on a network-on-chip for all their communication needs. A single link failure within this on-chip fabric can impede, halt, or even deadlock, intertile communication, which can render the entire chip multiprocessor useless. In this paper, we present a technique capable of handling very large numbers of permanent wire failures that occur in parallel links either at manufacture-time or at runtime (dynamically). As opposed to marking an entire parallel link as faulty, whenever some wires fail, the proposed methodology employs these partially-faulty links (PFLs) to continue the transfer of information-albeit at a gracefully degraded mode-in order to maintain network connectivity. Furthermore, the presented technique can designate PFLs as fully-faulty when several wires fail, by utilizing appropriate routing algorithms that bypass nonoperational links, while still maintaining load-balance in the vicinity of PFLs. The proposed scheme employs architectural support within the on-chip routers to detect link failures and enable reconfiguration at the granularity of individual wires. Hardware synthesis confirms the low-cost nature of the proposed architecture, and full-system simulations using both synthetic network traffic and real workloads demonstrate its efficacy.
Keywords
fault tolerance; microprocessor chips; network routing; network-on-chip; silicon; wires; CMP; NoC; PFL; billion-transistor designs; bypass nonoperational links; chip multiprocessors; dynamically adjusting gracefully degrading link-level fault-tolerant mechanism; full-system simulations; hardware synthesis; individual wire granularity; intertile communication; link failure detection; load-balance; massive transistor integration densities; nanometer feature sizes; network connectivity; network-on-chip; on-chip fabric; on-chip routers; parallel links; partially-faulty links; permanent wire failures; routing algorithms; silicon technology; single link failure; synthetic network traffic; Algorithm design and analysis; Circuit faults; Clocks; Hardware; Routing; Vectors; Wires; Fault-tolerance; networks-on-chip (NoCs); on-chip interconnection networks; router microarchitecture; routing algorithm;
fLanguage
English
Journal_Title
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
Publisher
ieee
ISSN
0278-0070
Type
jour
DOI
10.1109/TCAD.2012.2188801
Filename
6238398
Link To Document