Title :
RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers
Author :
Schirmeier, Horst ; Neuhalfen, Jens ; Korb, Ingo ; Spinczyk, Olaf ; Engel, Michael
Author_Institution :
Dept. of Comput. Sci. 12, Tech. Univ., Dortmund, Germany
Abstract :
Memory errors are a major source of reliability problems in current computers. Undetected errors may result in program termination, or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. Often, neither additional circuitry to support hardware-based error detection nor downtime for performing hardware tests can be afforded. In the case of permanent memory errors, a system faces two challenges: detecting errors as early as possible and handling them while avoiding system downtime. To increase system reliability, we have developed RAMpage, an online memory testing infrastructure for commodity x86-64-based Linux servers, which is capable of efficiently detecting memory errors and which provides graceful degradation by withdrawing affected memory pages from further use. We describe the design and implementation of RAMpage and present results of an extensive qualitative as well as quantitative evaluation.
Keywords :
Linux; program testing; random-access storage; software reliability; RAMpage; commodity Linux servers; graceful degradation management; hardware test; hardware-based error detection; online memory testing infrastructure; permanent memory errors; program termination; quantitative evaluation; reliability problems; silent data corruption; system reliability; Degradation; Kernel; Linux; Memory management; Random access memory; Servers; Testing; DRAM chips; Fault tolerance; Operating systems;
Conference_Titel :
Dependable Computing (PRDC), 2011 IEEE 17th Pacific Rim International Symposium on
Conference_Location :
Pasadena, CA
Print_ISBN :
978-1-4577-2005-5
Electronic_ISBN :
978-0-7695-4590-5
DOI :
10.1109/PRDC.2011.20