DocumentCode
2737900
Title
Portable checkpointing and recovery
Author
Silva, Luis M. ; Silva, João G. ; Chapple, Simon ; Clarke, Lyndon
Author_Institution
Dept. de Engenharia Inf., Coimbra Univ., Portugal
fYear
1995
fDate
2-4 Aug 1995
Firstpage
188
Lastpage
195
Abstract
This paper presents a checkpointing scheme that was implemented in a parallel library that runs on top of CHIMP/MPI. The main goals of the checkpointing mechanism are portability and efficiency. It runs on every platform supported by MPI in a machine-independent way. The scheme allows the migration of checkpoints and offers a flexible recovery mechanism based on data-reconfiguration. Some performance results will be presented at the end of the paper together with some techniques that can be used to increase the efficiency of the checkpointing mechanism
Keywords
operating systems (computers); parallel machines; software portability; system recovery; data-reconfiguration; f CHIMP/MPI; flexible recovery mechanism; parallel library; portability; portable checkpointing; recovery; Checkpointing; Computer crashes; Distributed computing; Guidelines; Libraries; Operating systems; Parallel machines; Parallel processing; Proposals; Workstations;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Distributed Computing, 1995., Proceedings of the Fourth IEEE International Symposium on
Conference_Location
Washington, DC
ISSN
1082-8907
Print_ISBN
0-8186-7088-6
Type
conf
DOI
10.1109/HPDC.1995.518709
Filename
518709
Link To Document