• DocumentCode
    1761082
  • Title

    Design and Evaluation of Confidence-Driven Error-Resilient Systems

  • Author

    Chia-Hsiang Chen ; Blaauw, D. ; Sylvester, Dennis ; Zhengya Zhang

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA
  • Volume
    22
  • Issue
    8
  • fYear
    2014
  • fDate
    Aug. 2014
  • Firstpage
    1727
  • Lastpage
    1737
  • Abstract
    Deeply scaled CMOS circuits are increasingly susceptible to transient faults and soft errors; emerging post-CMOS devices can be more vulnerable, sometimes exhibiting erratic errors of arbitrary duration. Applying timing and supply voltage margin is wasteful and becoming ineffective, and conventional checking and sparing techniques provide only a limited error coverage against widely varying errors. We propose a confidence-driven computing (CDC) model for an adaptive protection against nondeterministic errors. The CDC model employs fine-grained temporal redundancy and confidence checking for a faster adaptation and tunable reliability. The CDC model can be extended to deeply scaled CMOS circuits that are mainly affected by transient faults and soft errors, where an early checking (EC) technique can be used to perform independent error checking for more flexibility and better performance. To evaluate the CDC model, we apply a sample-based field-programmable gate array emulation along with real-time error injection. The CDC model is shown to adapt to fluctuating error rates and enhance the system reliability by effectively trading off performance. To evaluate the EC technique at a finer time scale, we create a new event-based simulation to capture path delay distribution, error model, and their interactions. The EC technique improves the system reliability by more than four orders of magnitude when errors are of short duration. Both the CDC model and the EC technique are synthesized in a 45-nm CMOS technology for cost estimates: 1) the area overhead is as low as 12% and 2) energy overhead can be limited to 19%.
  • Keywords
    CMOS integrated circuits; error detection; field programmable gate arrays; integrated circuit reliability; radiation hardening (electronics); semiconductor device reliability; transients; CMOS circuits; adaptation reliability; adaptive protection; arbitrary duration; checking techniques; confidence-driven computing model; confidence-driven error-resilient systems; delay distribution; erratic errors; error coverage; fine-grained temporal redundancy; fluctuating error rates; nondeterministic errors; real-time error injection; sample-based field-programmable gate array emulation; size 45 nm; soft errors; sparing techniques; supply voltage margin; system reliability; time scale; timing margin; transient faults; tunable reliability; Delays; Emulation; Error analysis; Redundancy; Semiconductor device modeling; Synchronization; Error detection; error simulation; field-programmable gate array (FPGA) emulation; reliability; resilient design; resilient design.;
  • fLanguage
    English
  • Journal_Title
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-8210
  • Type

    jour

  • DOI
    10.1109/TVLSI.2013.2277351
  • Filename
    6585814