Title :
A Highly Reliable Memory Subsystem Design Employing SECDED, Column Sparing, and Paging
Author_Institution :
Control Data Corporation, Minneapolis
fDate :
4/1/1986 12:00:00 AM
Abstract :
This paper develops a reliability model for a paged memory system wherein the pages of memory are physically distributed among several arrays of memory chips. Any of the available pages can be used to satisfy the required memory capacity. This paper also develops a reliability model for a page or block of memory words imbedded in an array. The model assumes that memory chips have failure modes that are catastrophic to a row, to a column, to the whole physical array, or to individual bits. Spare columns or data lines are used to enhance reliability. SECDED (Single Error Correction, Double Error Detection) provides the hard-fault detection mechanism and complete fault coverage for soft faults such as 1-bit upsets. A highly reliable memory system design is described that implements a paging scheme, uses a SECDED code for hard fault detection and isolation, and uses three levels of sparing to recover from failures. The significance of this paper is that it considers failure modes associated with interfacing a memory chip into an array of memory chips. These failure modes have an impact beyond the boundaries of an individual chip; they affect the entire physical array and must be considered in the reliability model. When this is done the reliability model permits trading off page size and array size with reliability.
Keywords :
Error correction; Error correction codes; Fault detection; Fault tolerance; Random access memory; Read only memory; Redundancy; Reliability; Solid state circuits; Statistical analysis;
Journal_Title :
Reliability, IEEE Transactions on
DOI :
10.1109/TR.1986.4335329