Title :
Processor- and memory-based checkpoint and rollback recovery
Author :
Bowen, Nicholas S. ; Pradham, D.K.
Author_Institution :
IBM Thomas J. Watson Res. Center. Yorktown Heights, NY, USA
Abstract :
Several hardware-based techniques that support checkpoint and rollback recovery are presented. The focus is on hardware schemes for uniprocessors, shared-memory multiprocessors, and distributed virtual-memory systems. A taxonomy for processor and memory techniques based on the memory hierarchy is presented. This provides a basis for understanding subtle differences among the various schemes. Processor-based schemes that handle transient faults by using processor-based transparent rollback techniques and memory-based schemes that roll back data instead of instructions and can be integrated with the processor techniques or can be exploited by higher levels of software are discussed.<>
Keywords :
fault tolerant computing; shared memory systems; storage management; system recovery; virtual storage; distributed virtual-memory systems; hardware schemes; hardware-based techniques; memory hierarchy; memory techniques; processor techniques; processor-based transparent rollback techniques; rollback recovery; shared-memory multiprocessors; subtle differences; taxonomy; transient faults; uniprocessors; Availability; Computer crashes; Databases; Fault detection; Fault tolerant systems; File systems; Hardware; Nonvolatile memory; Parity check codes; Taxonomy;