• DocumentCode
    177330
  • Title

    MemGuard: A low cost and energy efficient design to support and enhance memory system reliability

  • Author

    Long Chen ; Zhao Zhang

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
  • fYear
    2014
  • fDate
    14-18 June 2014
  • Firstpage
    49
  • Lastpage
    60
  • Abstract
    Memory system reliability is increasingly a concern as memory cell density and capacity continue to grow. The conventional approach is to use redundant memory bits for error detection and correction, with significant storage, cost and power overheads. In this paper, we propose a novel, system-level scheme called MemGuard for memory error detection. With OS-based checkpointing, it is also able to recover program execution from memory errors. The memory error detection of MemGuard is motivated by memory integrity verification using log hashes. It is much stronger than SECDED in error detection, incurs negligible hardware cost and energy overhead and no storage overhead, and is compatible with various memory organizations. It may play the role of ECC memory in consumer-level computers and mobile devices, without the shortcomings of ECC memory. In server computers, it may complement SECDED ECC or Chipkill Correct by providing even stronger error detection. We have comprehensively investigated and evaluated the feasibility and reliability of MemGuard. We show that using an incremental multiset hash function and a non-cryptographic hash function, the performance and energy overheads of Mem-Guard are negligible. We use the mathematical deduction and synthetic simulation to prove that MemGuard is robust and reliable.
  • Keywords
    checkpointing; cryptography; error correction; error detection; storage management; Chipkill Correct; ECC memory; MemGuard; OS-based checkpointing; SECDED ECC; consumer-level computers; energy efficient design; energy overhead; error correction; hardware cost; incremental multiset hash function; mathematical deduction; memory cell density; memory error detection; memory organizations; memory system reliability; mobile devices; noncryptographic hash function; program execution; redundant memory bits; storage overhead; synthetic simulation; system-level scheme; Computers; Error analysis; Error correction codes; Memory management; Organizations; Random access memory; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
  • Conference_Location
    Minneapolis, MN
  • Print_ISBN
    978-1-4799-4396-8
  • Type

    conf

  • DOI
    10.1109/ISCA.2014.6853221
  • Filename
    6853221