DocumentCode :
289022
Title :
The performance of independent checkpointing in distributed systems
Author :
Sens, Pierre
Author_Institution :
MASI Lab., Paris VI Univ., France
Volume :
2
fYear :
1995
fDate :
3-6 Jan 1995
Firstpage :
525
Abstract :
The paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed systems. Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a message logging mechanism. We have measured fault management overhead for different kinds of parallel applications. The costs of checkpointing are very low. However, message logging introduces a sizeable overhead. We compare these results to other works implementing different checkpointing policies, and we show that independent checkpointing is an efficient way to provide fault tolerance for long-running distributed applications composed of processes exchanging small streams of data
Keywords :
Unix; distributed processing; fault tolerant computing; local area networks; parallel processing; performance evaluation; system recovery; workstations; distributed systems; fault management overhead; fault tolerance; independent checkpointing; long-running distributed applications; message logging mechanism; parallel applications; performance; run-time overhead; small data streams; workstations network; Checkpointing; Costs; Fault tolerance; Fault tolerant systems; Hardware; Local area networks; Measurement; Operating systems; Runtime; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on
Conference_Location :
Wailea, HI
Print_ISBN :
0-8186-6930-6
Type :
conf
DOI :
10.1109/HICSS.1995.375504
Filename :
375504
Link To Document :
بازگشت