Title :
A fast runtime fault recovery approach for NoC-based MPSoCS for performance constrained applications
Author :
Wachter, Eduardo ; Erichsen, Augusto ; Juracy, Leonardo ; Amory, Alexandre ; Moraes, Fernando G.
Author_Institution :
FACIN, PUCRS, Porto Alegre, Brazil
Abstract :
Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory to cope with transient and permanent faults. This issue is even more relevant in nanotechnologies due to process variability, aging effects, and susceptibility to upsets, among other factors. The literature presents isolated solutions to deal with faults in the MPSoC communication infrastructure. In this context, one gap to be fulfilled is to integrate all layers, resulting in a solution to cope with NoC faults from the physical layer up to the application layer. The goal of this work is to present a runtime integrated approach to cope with NoC faults in MPSoCs. The original contribution is the proposal of a set of hardware and software mechanisms to ensure both efficient and reliable communication in NoC-based MPSoCs. The proposal has an acceptable silicon area overhead and a small memory footprint. Experiments demonstrate that benchmarks (synthetic and real MPSoC applications) were simulated with thousands of random fault injections, and all of them were executed correctly. Moreover, the average application execution time overhead is lower than 0.5%. This suggests the proposed fault tolerant method could be used in applications with reliability and performance constraints.
Keywords :
fault tolerant computing; integrated circuit reliability; multiprocessor interconnection networks; network-on-chip; NoC-based MPSoC; aging effects; fast runtime fault recovery approach; memory footprint; multiprocessor system-on-chips; nanotechnologies; performance constrained applications; permanent faults; process variability; random fault injections; silicon area overhead; transient faults; Fault tolerance; Fault tolerant systems; Ports (Computers); Program processors; Protocols; Routing; System recovery; NoC-based MPSoC; fault recovery; fault-tolerant NoCs; fault-tolerant communication;
Conference_Titel :
Integrated Circuits and Systems Design (SBCCI), 2014 27th Symposium on
Conference_Location :
Aracaju
DOI :
10.1145/2660540.2660986