Title :
System-level versus user-defined checkpointing
Author :
L.M. Silva;J.G. Silva
Author_Institution :
Dept. Engenharia Inf., Coimbra Univ., Portugal
Abstract :
Checkpointing and rollback recovery is a very effective technique to tolerate transient faults and preventive shutdowns. In the past, most of the checkpointing schemes published in the literature were supposed to be transparent to the application programmer and implemented at the operating-system level. In recent years, there has been some work on higher-level forms of checkpointing. In this second approach, the user is responsible for the checkpoint placement and is required to specify the checkpoint contents. We compare the two approaches: system-level and user-defined checkpointing. We discuss the pros and cons of both approaches and we present an experimental study that was conducted on a commercial parallel machine.
Keywords :
"Checkpointing","Programming profession","Operating systems","Program processors","Fault tolerance","Runtime library","Fault tolerant systems","Communication channels"
Conference_Titel :
Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on
Print_ISBN :
0-8186-9218-9
DOI :
10.1109/RELDIS.1998.740476