DocumentCode
2601046
Title
Lazy checkpoint coordination for bounding rollback propagation
Author
Wang, Yi-Min ; Fuchs, W. Kent
Author_Institution
Univ. of Illinois at Urbana-Champaign, IL, USA
fYear
1993
fDate
6-8 Oct 1993
Firstpage
78
Lastpage
85
Abstract
The technique of lazy checkpoint coordination, which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation is proposed. The notion of laziness is introduced to control the coordination frequency and allow a flexible tradeoff between the cost of checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme
Keywords
fault tolerant computing; parallel programming; system monitoring; system recovery; average rollback distance; checkpoint overhead; communication-induced checkpoint coordination; coordination frequency; lazy checkpoint coordination; parallel programs; process autonomy; rollback propagation; Checkpointing; Contracts; Costs; Frequency measurement; History; Laboratories; Message passing; NASA; Performance evaluation; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Reliable Distributed Systems, 1993. Proceedings., 12th Symposium on
Conference_Location
Princeton, NJ
Print_ISBN
0-8186-4310-2
Type
conf
DOI
10.1109/RELDIS.1993.393471
Filename
393471
Link To Document