Title :
Reliability of scrubbing recovery-techniques for memory systems
Author :
Saleh, Abdallah M. ; Serrano, Juan J. ; Patel, Janak H.
Author_Institution :
AT&T Bell Lab., Holmdel, NJ, USA
fDate :
4/1/1990 12:00:00 AM
Abstract :
The authors analyze the problem of transient-error recovery in fault-tolerant memory systems, using a scrubbing technique. This technique is based on single-error-correction and double-error-detection (SEC-DED) codes. When a single error is detected in a memory word, the error is corrected and the word is rewritten in its original location. Two models are discussed: (1) exponentially distributed scrubbing, where a memory word is assumed to be checked in an exponentially distributed time period, and (2) deterministic scrubbing, where a memory word is checked periodically. Reliability and mean-time-to-failure (MTTF) equations are derived and estimated. The results of the scrubbing techniques are compared with those of memory systems without redundancies and with only SEC-DED codes. A major contribution of the analysis is easy-to-use expressions for MTTF of memories. The authors derive reliability functions and mean time to failure of four different memory systems subject to transient errors at exponentially distributed arrival times
Keywords :
digital storage; error correction codes; error detection codes; failure analysis; fault tolerant computing; integrated memory circuits; reliability theory; MTTF; SEC-DED codes; deterministic scrubbing; double-error-detection; exponentially distributed scrubbing; fault-tolerant memory systems; mean-time-to-failure; memory systems; reliability; scrubbing recovery-techniques; single-error-correction; transient-error recovery; Alpha particles; Computer displays; Computer errors; Electromagnetic transients; Error analysis; Error correction; Fault tolerant systems; Frequency; Reliability theory; Very large scale integration;
Journal_Title :
Reliability, IEEE Transactions on