Title :
Efficient and fault-tolerant checkpointing procedures for distributed systems
Author :
Saleh, Kassem ; Agarwal, Anjali
Author_Institution :
Dept. of Electr. & Comput. Eng., Kuwait Univ., Kuwait
Abstract :
Problems related to distributed systems fault-tolerance are tackled by providing efficient and fault-tolerant algorithm procedures for checkpointing and rollback recovery for such systems. The authors propose checkpointing algorithms which can be initiated by any process in the system or upon failure of one or more component processes as part of a backward recovery procedure. The algorithm return the most recent and consistent checkpoints, require less stable storage and do not interfere with the progress of the distributed system application. Obtaining a consistent checkpoint is always guaranteed. Examples illustrating these algorithms are also provided
Keywords :
distributed databases; fault tolerant computing; backward recovery procedure; distributed systems; fault-tolerant algorithm procedures; fault-tolerant checkpointing procedures; rollback recovery; Checkpointing; Delay; Distributed algorithms; Distributed computing; Fault tolerant systems; Joining processes; Law; Legal factors; Resumes; System recovery;
Conference_Titel :
Computers and Communications, 1993., Twelfth Annual International Phoenix Conference on
Conference_Location :
Tempe, AZ
Print_ISBN :
0-7803-0922-7
DOI :
10.1109/PCCC.1993.344469