DocumentCode :
3065139
Title :
Adaptive two-level blocking coordinated checkpointing based on recovery cost
Author :
Lotfi, Mehdi ; Motamedi, Seyed Ahmad ; Bandarabadi, Mojtaba
Author_Institution :
Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran
fYear :
2009
fDate :
15-17 March 2009
Firstpage :
113
Lastpage :
117
Abstract :
In this paper we introduce a new adaptive two-level blocking coordinated checkpointing for cluster computing systems. First level of checkpointing is local checkpointing and computing nodes save the checkpoints in local disk based on transient failure rates. If a transient failure occurs in the computing node, process can recover from local disk. Second level of checkpointing is global checkpointing and computing nodes send their checkpoints to high reliable global stable storage in network based on the expected recovery time in the case of permanent failure. If a permanent failure occurs in the computing node, computing node can not be used and process can recover from global storage in a new computing node. Transient failures are probable than permanent failures and the number of global checkpointing is very lower than local checkpointing. Based on this method, coordinated checkpointing overhead is reduced and it is proportional to transient and permanent failure rates of cluster systems.
Keywords :
checkpointing; workstation clusters; adaptive two-level blocking coordinated checkpointing; cluster computing system; computing nodes; permanent failure; process recovery; recovery cost; transient failure rate; Adaptive systems; Checkpointing; Clustering algorithms; Communication channels; Computer errors; Computer networks; Costs; Delay; Frequency synchronization; Space technology; adaptive two-level checkpointing; blocking coordinated checkpointing; failure; recovery cost;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Theory, 2009. SSST 2009. 41st Southeastern Symposium on
Conference_Location :
Tullahoma, TN
ISSN :
0094-2898
Print_ISBN :
978-1-4244-3324-7
Electronic_ISBN :
0094-2898
Type :
conf
DOI :
10.1109/SSST.2009.4806839
Filename :
4806839
Link To Document :
بازگشت