DocumentCode
3319880
Title
A context saving fault tolerant approach for a shared memory many-core architecture
Author
Wachter, Eduardo ; Ventroux, Nicolas ; Moraes, Fernando G.
Author_Institution
FACIN, PUCRS, Porto Alegre, Brazil
fYear
2015
fDate
24-27 May 2015
Firstpage
1570
Lastpage
1573
Abstract
Mechanisms for runtime fault-tolerance in many-core architectures are mandatory to cope with transient and permanent faults. This issue is even more relevant with aggressive technology nodes due to process variability, aging effects, and susceptibility to upsets, among other factors. This work proposes to save periodically the context and to re-schedule tasks to the last reliable known state and avoid the faulty processor. This technique is implemented on an embedded multicore architecture named P2012. The proposed fault-tolerant approach induces a limited overhead of 9.37% in an industrial image processing application while guaranteeing a full-error recovery if any error is detected.
Keywords
embedded systems; fault tolerance; multiprocessing systems; system recovery; system-on-chip; P2012 embedded multicore architecture; aging effects; context saving fault tolerant approach; full-error recovery; industrial image processing; permanent faults; process variability; reschedule tasks; runtime fault-tolerance; shared memory many-core architecture; transient faults; upset susceptibility; Computer architecture; Context; Fault tolerance; Fault tolerant systems; Hardware; Software; Synchronization; NoC-based MPSoC; checkpointing; context saving; fault recovery; rollback;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems (ISCAS), 2015 IEEE International Symposium on
Conference_Location
Lisbon
Type
conf
DOI
10.1109/ISCAS.2015.7168947
Filename
7168947
Link To Document