• DocumentCode
    4073
  • Title

    Error-Resilient Design Techniques for Reliable and Dependable Computing

  • Author

    Das, Shidhartha ; Bull, David M. ; Whatmough, Paul N.

  • Author_Institution
    ARM Ltd., Cambridge, UK
  • Volume
    15
  • Issue
    1
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    24
  • Lastpage
    34
  • Abstract
    Integrated circuits in modern systems-on-chip and microprocessors are typically operated with sufficient timing margins to mitigate the impact of rising process, voltage, and temperature (PVT) variations at advanced process nodes. The widening margins required for ensuring robust computation inevitably lead to conservative designs with unacceptable energy-efficiency overheads. Reconciling the conflicting objectives imposed by variation mitigation and energy-efficient computing will require fundamental departures from conventional circuit and system design practices. This paper posits error-resilient general-purpose computing as an effective approach for achieving this. We review resilient techniques that exploit tolerance to timing errors to automatically compensate for variations and dynamically tune a system to its most efficient operating point. We present the Razor approach as a pioneering example of such a technique. We present silicon measurement results from multiple industrial and academic demonstration systems that employ Razor dynamic voltage and frequency management. In particular, we highlight the application of Razor to two specific platforms. The first is an ARM-based industrial prototype where Razor dynamic adaptation leads to 52% energy savings at 1 GHz operation. The second platform applies Razor for robust operation in the presence of radiation-induced Single Event Upsets. These efforts clearly demonstrate how energy-efficient compute engines can be designed by combining timing-error resiliency with optimizations across algorithms, circuits, and microarchitecture boundaries.
  • Keywords
    elemental semiconductors; integrated circuit design; integrated circuit reliability; microprocessor chips; silicon; system-on-chip; ARM-based industrial prototype; Razor dynamic voltage; Si; error-resilient design; frequency 1 GHz; frequency management; integrated circuits; microprocessors; radiation-induced single event upsets; systems-on-chip; timing-error resiliency; Energy efficiency; Flip-flops; Inverters; Latches; Pipelines; Reliability; Timing; Energy-efficient Digital Design; Error-resilient Computing; Error-resilient computing; Variation Mitigation; energy-efficient digital design; variation mitigation;
  • fLanguage
    English
  • Journal_Title
    Device and Materials Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1530-4388
  • Type

    jour

  • DOI
    10.1109/TDMR.2015.2389038
  • Filename
    7001640