DocumentCode :
2989195
Title :
Fault-Tolerant Flow Control in On-chip Networks
Author :
Kang, Young Hoon ; Kwon, Taek-Jun ; Draper, Jeffrey
Author_Institution :
Inf. Sci. Inst., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2010
fDate :
3-6 May 2010
Firstpage :
79
Lastpage :
86
Abstract :
Scaling of interconnects exacerbates the already challenging reliability of on-chip networks. Although many researchers have provided various fault handling techniques in chip multi-processors (CMPs), the fault-tolerance of the interconnection network is yet to adequately evolve. As an end-to-end recovery approach delays fault detection and complicates recovery to a consistent global state in such a system, a link-level retransmission is endorsed for recovery, making a higher-level protocol simple. In this paper, we introduce a fault-tolerant flow control scheme for soft error handling in on-chip networks. The fault-tolerant flow control recovers errors at a link-level by requesting retransmission and ensures an error-free transmission on a flit-basis with incorporation of dynamic packet fragmentation. Dynamic packet fragmentation is adopted as a part of fault-tolerant flow control to disengage flits from the fault-containment and recover the faulty flit transmission. Thus, the proposed router provides a high level of dependability at the link-level for both datapath and control planes. In simulation with injected faults, the proposed router is observed to perform well, gracefully degrading while exhibiting 97% error coverage in datapath elements. The proposed router has been implemented using a TSMC 45 nm standard cell library. As compared to a router which employs triple modular redundancy (TMR) in datapath elements, the proposed router takes 58% less area and consumes 40% less energy per packet on average.
Keywords :
fault tolerant computing; multiprocessing systems; network routing; network-on-chip; TSMC standard cell library; chip multiprocessors; dynamic packet fragmentation; fault handling techniques; fault-containment; fault-tolerant flow control; faulty flit transmission; interconnection network; link-level retransmission; on-chip networks; router; size 45 nm; triple modular redundancy; Degradation; Delay; Error correction; Fault detection; Fault tolerance; Libraries; Multiprocessor interconnection networks; Network-on-a-chip; Protocols; Redundancy; fault-tolerant router; networks-on-chip; soft-error handling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networks-on-Chip (NOCS), 2010 Fourth ACM/IEEE International Symposium on
Conference_Location :
Grenoble
Print_ISBN :
978-1-4244-7085-3
Electronic_ISBN :
978-1-4244-7086-0
Type :
conf
DOI :
10.1109/NOCS.2010.18
Filename :
5507558
Link To Document :
بازگشت