DocumentCode
3065197
Title
Proactive blocking coordinated checkpointing with dynamic intervals
Author
Lotfi, Mehdi ; Motamedi, Seyed Ahmad ; Bandarabadi, Mojtaba
Author_Institution
Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran
fYear
2009
fDate
15-17 March 2009
Firstpage
118
Lastpage
121
Abstract
In this paper we introduce a new proactive blocking coordinated checkpointing for cluster computing systems with dynamic interval. Many current schemes to increase the availability of cluster computing systems either make use of redundancy in space or redundancy in time (reactive methods). These methods induce the overhead to the cluster computing system in failure free execution time. In order to minimize the performance loss (rollback and checkpoint overheads) due to unexpected failures or unnecessary overhead of fault tolerant mechanisms, we present a proactive method for the blocking coordinated checkpointing strategy. Existing checkpointing methods are static with constant checkpointing interval. These methods are based on the exponential distribution function. In this paper we use the Weibull distribution function to find the dynamic interval. Our method is based on the failure data analysis of LANL cluster system. Experimental results show that average execution time of NAS application is significantly reduced by using the proposed method.
Keywords
Weibull distribution; checkpointing; exponential distribution; program diagnostics; software fault tolerance; software performance evaluation; workstation clusters; Weibull distribution function; cluster computing system; dynamic interval; exponential distribution function; failure free execution time; fault tolerant mechanism; performance loss minimization; proactive blocking coordinated checkpointing; static checkpointing method; Availability; Checkpointing; Clustering algorithms; Data analysis; Fault tolerance; Frequency synchronization; Hardware; Redundancy; Space technology; Weibull distribution; blocking coordinated checkpointing; dynamic interval; proactive checkpointing; weibull distribution;
fLanguage
English
Publisher
ieee
Conference_Titel
System Theory, 2009. SSST 2009. 41st Southeastern Symposium on
Conference_Location
Tullahoma, TN
ISSN
0094-2898
Print_ISBN
978-1-4244-3324-7
Electronic_ISBN
0094-2898
Type
conf
DOI
10.1109/SSST.2009.4806842
Filename
4806842
Link To Document