Author :
Yazdanbakhsh, Amir ; Balasubramanian, Raghuraman ; Nowatzki, Tony ; Sankaralingam, Karthikeyan
Author_Institution :
Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
This article develops a comprehensive technique for failure prediction in the field for many-core processors to address wear out in harsh environments for logic and static RAM (SRAM). The authors develop three principles. First, virtually aging a processor by momentarily reducing its voltage exposes wear-out failure. Second, sampled redundancy can be used to capture logic wear-out failures because their underlying fault model is delay faults, which makes them poorly suited for using test-vector-based techniques. Third, wear out on SRAMs decreases their noise margin, and the end results can be effectively modeled as a stuck-at fault, thus allowing asymmetric checkers like built-in self-test to work effectively. The authors´ design comprises two components (Aged-SDMR, which combines sampling and dual-modular redundancy, and Aged-AsymChk, which uses an asymmetric checker); has a simple implementation; and delivers low complexity, low overheads, and high accuracy. In addition to ensuring no corruptions or missed errors from wear-out failures, the full system predicts failures within 0.4 days for logic and within milliseconds for SRAM after their appearance. Furthermore, compared to SRAMs protected only with error-correcting code and decommissioned on first failure, the authors extend lifetime by 14 months on average.
Keywords :
SRAM chips; failure analysis; integrated circuit reliability; redundancy; SRAM; circuit failure prediction; error-correcting code; manycore processor; redundancy; static RAM; vector-based technique; virtual aging; Aging; Built-in self-test; Failure analysis; Fault tolerance; Logic gates; Predictive models; Program processors; SRAM cells; dual-modular redundancy; failure prediction; fault tolerance; reliability; sampling; wear out;