• DocumentCode
    2014467
  • Title

    Lifetime Reliability-Aware Checkpointing Mechanism: Modelling and Analysis

  • Author

    bin Bandan, Mohamad Imran ; Bhattacharjee, Sangeeta ; Shafik, Rishad Ahmed ; Pradhan, D.K. ; Mathew, Jinesh

  • Author_Institution
    Univ. of Bristol, Bristol, UK
  • fYear
    2013
  • fDate
    10-12 Dec. 2013
  • Firstpage
    128
  • Lastpage
    132
  • Abstract
    Check pointing mechanism is used to tolerate the impact of transient faults through roll-back operation to a previously saved system state. In this paper, we propose a novel check pointing mechanism that considers fault tolerance in a duplex system in the presence of both transient and permanent faults. The main objective of our proposed mechanism is to extend the lifetime reliability of the duplex system by avoiding or even tolerating permanent faults in microprocessors. In addition, we also propose to migrate tasks from a ´near-to-die´ processor to a spare processor under a condition where the current Mean-Time-To-Failure (MTTF) value is less or equal to a pre-determined threshold MTTF value. We validate our proposed mechanism and perform overhead analysis using various case studies. Later, we compare it with one of the most popular existing check pointing mechanism, namely the roll-forward check pointing scheme [9]. We show that unlike roll-back or roll-forward mechanisms, our proposed mechanism gives significantly higher lifetime reliability with reasonable system overheads.
  • Keywords
    integrated circuit reliability; microprocessor chips; MTTF; duplex system lifetime reliability; lifetime reliability-aware checkpointing mechanism; mean-time-to-failure; microprocessors; near-to-die processor; permanent faults; roll-forward check pointing scheme; transient faults; Built-in self-test; Checkpointing; Circuit faults; Registers; Reliability engineering; Transient analysis; Checkpointing; fault tolerance; lifetime reliability; microprocessors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electronic System Design (ISED), 2013 International Symposium on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-5143-2
  • Type

    conf

  • DOI
    10.1109/ISED.2013.32
  • Filename
    6808655