DocumentCode :
228794
Title :
ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems
Author :
Xun Jian ; Kumar, Ravindra
Author_Institution :
Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
fYear :
2014
fDate :
16-21 Nov. 2014
Firstpage :
1035
Lastpage :
1046
Abstract :
Servers and HPC systems often use a strong memory error correction code, or ECC, to meet their reliability and availability requirements. However, these ECCs often require significant capacity and/or power overheads. We observe that since memory channels are independent from one another, error correction typically needs to be performed for one channel at a time. Based on this observation, we show that instead of always storing in memory the actual ECC correction bits as do existing systems, it is sufficient to store the bitwise parity of the ECC correction bits of different channels for fault-free memory regions, and store the actual ECC correction bits only for faulty memory regions. By trading off the resultant ECC capacity overhead reduction for improved memory energy efficiency, the proposed technique reduces memory energy per instruction by 54.4% and 20.6%, respectively, compared to a commercial chip kill correct ECC and a DIMM-kill correct ECC, while incurring similar or lower capacity overheads.
Keywords :
DRAM chips; error correction codes; parallel processing; parity check codes; storage management; DIMM-kill correct ECC; ECC capacity overhead reduction; ECC correction bits; ECC parity; HPC system; availability requirement; bitwise parity; commercial chip kill correct ECC; fault-free memory region; faulty memory region; memory channel; memory energy efficiency; memory energy per instruction; memory error correction code; memory error resilience; multichannel memory system; reliability requiement; Circuit faults; Error correction; Error correction codes; Layout; Memory management; Optimization; Resilience;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
Conference_Location :
New Orleans, LA
Print_ISBN :
978-1-4799-5499-5
Type :
conf
DOI :
10.1109/SC.2014.89
Filename :
7013071
Link To Document :
بازگشت