DocumentCode
177330
Title
MemGuard: A low cost and energy efficient design to support and enhance memory system reliability
Author
Long Chen ; Zhao Zhang
Author_Institution
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fYear
2014
fDate
14-18 June 2014
Firstpage
49
Lastpage
60
Abstract
Memory system reliability is increasingly a concern as memory cell density and capacity continue to grow. The conventional approach is to use redundant memory bits for error detection and correction, with significant storage, cost and power overheads. In this paper, we propose a novel, system-level scheme called MemGuard for memory error detection. With OS-based checkpointing, it is also able to recover program execution from memory errors. The memory error detection of MemGuard is motivated by memory integrity verification using log hashes. It is much stronger than SECDED in error detection, incurs negligible hardware cost and energy overhead and no storage overhead, and is compatible with various memory organizations. It may play the role of ECC memory in consumer-level computers and mobile devices, without the shortcomings of ECC memory. In server computers, it may complement SECDED ECC or Chipkill Correct by providing even stronger error detection. We have comprehensively investigated and evaluated the feasibility and reliability of MemGuard. We show that using an incremental multiset hash function and a non-cryptographic hash function, the performance and energy overheads of Mem-Guard are negligible. We use the mathematical deduction and synthetic simulation to prove that MemGuard is robust and reliable.
Keywords
checkpointing; cryptography; error correction; error detection; storage management; Chipkill Correct; ECC memory; MemGuard; OS-based checkpointing; SECDED ECC; consumer-level computers; energy efficient design; energy overhead; error correction; hardware cost; incremental multiset hash function; mathematical deduction; memory cell density; memory error detection; memory organizations; memory system reliability; mobile devices; noncryptographic hash function; program execution; redundant memory bits; storage overhead; synthetic simulation; system-level scheme; Computers; Error analysis; Error correction codes; Memory management; Organizations; Random access memory; Reliability;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
Conference_Location
Minneapolis, MN
Print_ISBN
978-1-4799-4396-8
Type
conf
DOI
10.1109/ISCA.2014.6853221
Filename
6853221
Link To Document