DocumentCode
3444008
Title
Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach
Author
Bhargava, Bharat ; Lian, Shy-Renn
Author_Institution
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
fYear
1988
fDate
10-12 Oct 1988
Firstpage
3
Lastpage
12
Abstract
A checkpoint algorithm is presented that benefits from the research in concurrency control, commit, and site recovery algorithms in transaction processing. In the authors´ approach a number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tolerating the failure of an arbitrary number of processes. Each process takes checkpoints independently. During recovery after a failure, a process invokes a two-phase rollback algorithm. It collects information about relevant message exchanges in the system in the first phase and uses it in the second phase to determine both the set of processes that must roll back and the set of checkpoints up to which rollback must occur. Concurrent rollbacks are completed in the order of the priorities of the recovering processes. The proposed solution is optimistic in the sense that it does well if failures are infrequent by minimizing overhead during normal processing
Keywords
concurrency control; distributed databases; distributed processing; commit; concurrency control; concurrent rollback; distributed systems; independent checkpointing; recovery; site recovery algorithms; transaction processing; two-phase rollback algorithm; Checkpointing; Computer crashes; Computer science; Concurrent computing; Database systems; Delay; Machine intelligence; NASA; Virtual machining;
fLanguage
English
Publisher
ieee
Conference_Titel
Reliable Distributed Systems, 1988. Proceedings., Seventh Symposium on
Conference_Location
Columbus, OH
Print_ISBN
0-8186-0875-7
Type
conf
DOI
10.1109/RELDIS.1988.25775
Filename
25775
Link To Document