DocumentCode :
1350537
Title :
Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache
Author :
Paul, Somnath ; Cai, Fang ; Zhang, Xinmiao ; Bhunia, Swarup
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Case Western Reserve Univ., Cleveland, OH, USA
Volume :
60
Issue :
1
fYear :
2011
Firstpage :
20
Lastpage :
34
Abstract :
With increasing parameter variations in nanometer technologies, on-chip cache in processor is becoming highly vulnerable to runtime failures induced by “soft error,” voltage, or thermal noise and aging effects. Nondeterministic and unreliable memory operation due to these runtime failures can be addressed by: 1) designing the memory for worst-case scenarios and/or 2) runtime error detection and correction. Worst-case guard-banding can lead to overly pessimistic results for cell footprint and power. On the other hand, conventional error correcting code (ECC) used in processor cache has very limited correction capability, making it insufficient to protect memory in scaled technologies (sub-45 nm), which are vulnerable to multiple-bit failures in a word (64-bit). The requirement to tolerate multibit failures is accentuated with supply voltage scaling for low-power operation. We note that due to inter and intra-die parameter variations, different memory blocks move to different reliability corners. A uniform ECC protection for all memory blocks fails to account for the distribution of vulnerability across memory blocks. On the other hand, it can lead to overly pessimistic results if the worst-case vulnerability of a memory block is accounted for during ECC allocation. In this paper, we propose a reliability-driven ECC allocation scheme that matches the relative vulnerability of a memory block (determined using postfabrication characterization) with appropriate ECC protection. We achieve postfabrication variable ECC allocation by storing the check bits in the “ways” of an associative cache. We use shortened Bose-Chaudhuri-Hocquenghem (BCH) cyclic code with zero padding, which provides high random error correction capability with modest amount of check bits. Moreover, we propose efficient circuit/architecture-level optimizations of the ECC encoding/decoding logic to minimize the impact on area, performance, and energy. Simulation results fo- - r SPEC2000 benchmarks show that such a variable ECC scheme tolerates high failure rates with negligible performance (four percent) and area (0.2 percent) penalty.
Keywords :
BCH codes; benchmark testing; cache storage; circuit reliability; error correction codes; fault tolerant computing; low-power electronics; power aware computing; system-on-chip; Bose Chaudhuri Hocquenghem cyclic code; error correcting code; memory block; multibit failure tolerance; multiple bit error resilience; nanometer technology; on-chip cache; processor cache; random error correction capability; reliability driven ECC allocation; vulnerability distribution; zero padding; Aging; Arrays; Error correction codes; Random access memory; Reliability; Resource management; Runtime; Cache; process variation; runtime failures; soft error; variable ECC allocation.;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2010.203
Filename :
5601695
Link To Document :
بازگشت