DocumentCode :
2589573
Title :
On the optimum recovery of distributed programs
Author :
Silva, Luis Moura ; Silva, João Gabriel
Author_Institution :
Coimbra Univ., Portugal
fYear :
1994
fDate :
5-8 Sep 1994
Firstpage :
704
Lastpage :
711
Abstract :
In a previous paper (1992) the authors have presented a checkpointing algorithm for distributed applications. That algorithm is based on a non-blocking coordinated global checkpoint of the distributed program. However, the associated rollback algorithm does not provide the best results, since in most cases it forces the rollback of all the processes. This paper presents two algorithms for roll-back-recovery that minimize the number of processes which need to roll back. One of the algorithms is oriented to systems that use message logging, while the other is more general and can be used in those systems that only rely an a coordinated checkpoint and do not log messages. We will show that our proposal achieves the optimum results In the minimization of the number of processes that have to roll back
Keywords :
parallel algorithms; parallel programming; software fault tolerance; system recovery; checkpointing; distributed algorithms; distributed programs; distributed systems; fault-tolerance; message logging; optimum recovery; roll-back-recovery; vector time; Checkpointing; Computer applications; Computer crashes; Computer hacking; Concurrent computing; Delay; Distributed algorithms; Distributed computing; Fault tolerance; Proposals;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
EUROMICRO 94. System Architecture and Integration. Proceedings of the 20th EUROMICRO Conference.
Conference_Location :
Liverpool
Print_ISBN :
0-8186-6430-4
Type :
conf
DOI :
10.1109/EURMIC.1994.390340
Filename :
390340
Link To Document :
بازگشت