DocumentCode :
1524376
Title :
An on-line algorithm for checkpoint placement
Author :
Ziv, Avi ; Bruck, Jehoshua
Author_Institution :
MATAM-Adv. Technol. Center, IBM Israel Sci. & Technol. Center, Haifa, Israel
Volume :
46
Issue :
9
fYear :
1997
fDate :
9/1/1997 12:00:00 AM
Firstpage :
976
Lastpage :
985
Abstract :
Checkpointing enables us to reduce the time to recover from a fault by saving intermediate states of the program in a reliable storage. The length of the intervals between checkpoints affects the execution time of programs. On one hand, long intervals lead to long reprocessing time, while, on the other hand, too frequent checkpointing leads to high checkpointing overhead. In this paper, we present an on-line algorithm for placement of checkpoints. The algorithm uses knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost
Keywords :
program diagnostics; software fault tolerance; system recovery; checkpoint placement; checkpointing; fault-tolerant computing; fixed intervals; intermediate states; on-line algorithm; performance optimization; Availability; Checkpointing; Cost function; Fault detection; Optimization; Program processors; Programming profession; Time measurement;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/12.620479
Filename :
620479
Link To Document :
بازگشت