DocumentCode :
28554
Title :
Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router
Author :
Chaochao Feng ; Zhonghai Lu ; Jantsch, Axel ; Minxuan Zhang ; Zuocheng Xing
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
Volume :
21
Issue :
6
fYear :
2013
fDate :
Jun-13
Firstpage :
1053
Lastpage :
1066
Abstract :
Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and forward error correction link-level error control scheme to handle transient faults and a reinforcement-learning-based fault-tolerant deflection routing (FTDR) algorithm to tolerate permanent faults without deadlock and livelock. A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reduce the area overhead of the FTDR router. Synthesized results show that, compared with the FTDR router, the FTDR-H router can reduce the area by 27% in an 88 network. Simulation results demonstrate that under synthetic workloads, in the presence of permanent link faults, the throughput of an 8 8 network with FTDR and FTDR-H algorithms are 14% and 23% higher on average than that with the fault-on-neighbor (FoN) aware deflection routing algorithm and the cost-based deflection routing algorithm, respectively. Under real application workloads, the FTDR-H algorithm achieves 20% less hop counts on average than that of the FoN algorithm. For transient faults, the performance of the FTDR router can achieve graceful degradation even at a high fault rate. We also implement the fault-tolerant deflection router which can achieve 400 MHz in TSMC 65-nm technology.
Keywords :
automatic repeat request; electronic engineering computing; fault diagnosis; fault tolerance; forward error correction; integrated circuit reliability; learning (artificial intelligence); network routing; network-on-chip; FTDR-H algorithm; FTDR-H router; FoN aware deflection routing algorithm; NoC; TSMC technology; bufferless network-on-chip; cost-based deflection routing algorithm; fault rate; fault-on-neighbor aware deflection routing algorithm; fault-tolerant deflection router; fault-tolerant solution; forward error correction link-level error control scheme; frequency 400 MHz; hierarchical-routing-table-based algorithm; hop count; hybrid automatic repeat request; integrated circuit feature size; network throughput; online fault-diagnosis mechanism; permanent fault detection; permanent link fault; reinforcement-learning-based fault-tolerant deflection routing algorithm; size 65 nm; synthetic workload; transient fault detection; Automatic repeat request; Circuit faults; Encoding; Fault tolerance; Fault tolerant systems; Routing; Transient analysis; Deflection routing; fault-tolerance; on-line fault diagnosis; permanent fault; transient fault;
fLanguage :
English
Journal_Title :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-8210
Type :
jour
DOI :
10.1109/TVLSI.2012.2204909
Filename :
6255806
Link To Document :
بازگشت