• DocumentCode
    3319880
  • Title

    A context saving fault tolerant approach for a shared memory many-core architecture

  • Author

    Wachter, Eduardo ; Ventroux, Nicolas ; Moraes, Fernando G.

  • Author_Institution
    FACIN, PUCRS, Porto Alegre, Brazil
  • fYear
    2015
  • fDate
    24-27 May 2015
  • Firstpage
    1570
  • Lastpage
    1573
  • Abstract
    Mechanisms for runtime fault-tolerance in many-core architectures are mandatory to cope with transient and permanent faults. This issue is even more relevant with aggressive technology nodes due to process variability, aging effects, and susceptibility to upsets, among other factors. This work proposes to save periodically the context and to re-schedule tasks to the last reliable known state and avoid the faulty processor. This technique is implemented on an embedded multicore architecture named P2012. The proposed fault-tolerant approach induces a limited overhead of 9.37% in an industrial image processing application while guaranteeing a full-error recovery if any error is detected.
  • Keywords
    embedded systems; fault tolerance; multiprocessing systems; system recovery; system-on-chip; P2012 embedded multicore architecture; aging effects; context saving fault tolerant approach; full-error recovery; industrial image processing; permanent faults; process variability; reschedule tasks; runtime fault-tolerance; shared memory many-core architecture; transient faults; upset susceptibility; Computer architecture; Context; Fault tolerance; Fault tolerant systems; Hardware; Software; Synchronization; NoC-based MPSoC; checkpointing; context saving; fault recovery; rollback;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits and Systems (ISCAS), 2015 IEEE International Symposium on
  • Conference_Location
    Lisbon
  • Type

    conf

  • DOI
    10.1109/ISCAS.2015.7168947
  • Filename
    7168947