Title :
Recovery-Driven Design: Exploiting Error Resilience in Design of Energy-Efficient Processors
Author :
Kahng, Andrew B. ; Kang, Seokhyeong ; Kumar, Rakesh ; Sartori, John
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of California at San Diego, La Jolla, CA, USA
fDate :
3/1/2012 12:00:00 AM
Abstract :
Conventional computer-aided design (CAD) methodologies optimize a processor module for correct operation and prohibit timing violations during nominal operation. We propose recovery-driven design, a design approach that optimizes a processor module for a target timing error rate (ER) instead of correct operation. The target ER is chosen based on how many errors can be gainfully tolerated by a hardware or software error resilience mechanism. We show that significant power benefits are possible from a recovery-driven design approach that deliberately allows errors caused by voltage overscaling to occur during nominal operation, while relying on an error resilience technique to tolerate these errors. We present a detailed evaluation and analysis of such a CAD methodology that minimizes the power of a processor module for a target ER. We show how this design-level methodology can be extended to design recovery-driven processors-processors that are optimized to take advantage of hardware or software error resilience. We also discuss a gradual slack recovery-driven design approach that optimizes for a range of ERs to create soft processors-processors that have graceful failure characteristics and the ability to trade throughput or output quality for additional energy savings over a range of ERs. We demonstrate significant power benefits over conventional design-11.8% on average over all modules and ER targets, and up to 29.1% for individual modules. Processor-level benefits were 19.0%, on average. Benefits increase when recovery-driven design is coupled with an error resilience mechanism or when the number of available voltage domains increases.
Keywords :
circuit CAD; fault tolerance; integrated circuit design; integrated circuit reliability; CAD methodology; computer-aided design; energy savings; energy-efficient processor design; failure characteristics; gradual slack recovery-driven design approach; processor module; recovery-driven processor design; soft processor; software error resilience mechanism; target timing error rate; voltage overscaling; Hardware; Optimization; Program processors; Resilience; Sensitivity; Timing; Cell sizing; error resilience; power minimization; recovery-driven design; slack redistribution;
Journal_Title :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on
DOI :
10.1109/TCAD.2011.2172610