Title :
Fault recovery communication protocol for NoC-based MPSoCs
Author :
Wachter, Eduardo W. ; Amory, Alexandre M. ; Moraes, Fernando G.
Author_Institution :
Dept. of Inf., Univ. of Santa Cruz do Sul, Santa Cruz, Brazil
Abstract :
Mechanisms for fault-tolerance in MPSoCs are mandatory to cope with faults during fabrication or product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications even though the network has alternative fault-free paths to a given destination. This PhD work presents a fault-tolerant communication protocol that takes advantage of the NoC routing method to provide alternative paths between any source-target pair of processors. At the application layer, the method is seen as a typical MPI-like message passing protocol. At the lower layers, the method consists of a software kernel layer that monitors the regularity of message exchanges between pairs of tasks. If a message is not delivered in a certain time, the software fires the path finding mechanism, which guarantees complete network reachability. The proposed approach determines new paths quickly, and the costs of extra silicon area and memory usage are small.
Keywords :
fault tolerance; integrated circuit reliability; message passing; network routing; network-on-chip; MPI-like message passing protocol; MPSOC; NOC routing method; fault recovery communication protocol; fault tolerant communication protocol; interconnect network; multiprocessor system-on-chip; path finding mechanism; software kernel layer; source-target pair; Fault tolerance; Fault tolerant systems; Kernel; Message passing; Protocols; Routing; NoC-based MPSoC; fault-tolerant communication protocol;
Conference_Titel :
VLSI (ISVLSI), 2013 IEEE Computer Society Annual Symposium on
Conference_Location :
Natal
DOI :
10.1109/ISVLSI.2013.6654648