DocumentCode
3013859
Title
The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
Author
Plank, J.S. ; Thomason, M.G.
Author_Institution
Dept. of Comput. Sci., Tennessee Univ., Knoxville, TN, USA
fYear
1999
fDate
15-18 June 1999
Firstpage
250
Lastpage
257
Abstract
Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. While the literature abounds with performance models of checkpointing systems, none address the issue of selecting runtime parameters other than the optimal checkpointing interval. In particular the issue of processor allocation is typically ignored. In this paper we briefly present it performance model for long-running parallel computations that execute with checkpointing enabled. We then discuss how it is relevant to today´s parallel computing environments and software, and present case studies of using the model to select runtime parameters.
Keywords
parallel programming; software fault tolerance; software performance evaluation; checkpointing systems; parallel checkpointing systems; parallel computing; performance models; processor allocation; runtime parameters; Checkpointing; Computer science; Distributed computing; Electrical capacitance tomography; Electronic switching systems; Parallel processing; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Fault-Tolerant Computing, 1999. Digest of Papers. Twenty-Ninth Annual International Symposium on
Conference_Location
Madison, WI, USA
ISSN
0731-3071
Print_ISBN
0-7695-0213-X
Type
conf
DOI
10.1109/FTCS.1999.781059
Filename
781059
Link To Document