DocumentCode
289022
Title
The performance of independent checkpointing in distributed systems
Author
Sens, Pierre
Author_Institution
MASI Lab., Paris VI Univ., France
Volume
2
fYear
1995
fDate
3-6 Jan 1995
Firstpage
525
Abstract
The paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed systems. Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a message logging mechanism. We have measured fault management overhead for different kinds of parallel applications. The costs of checkpointing are very low. However, message logging introduces a sizeable overhead. We compare these results to other works implementing different checkpointing policies, and we show that independent checkpointing is an efficient way to provide fault tolerance for long-running distributed applications composed of processes exchanging small streams of data
Keywords
Unix; distributed processing; fault tolerant computing; local area networks; parallel processing; performance evaluation; system recovery; workstations; distributed systems; fault management overhead; fault tolerance; independent checkpointing; long-running distributed applications; message logging mechanism; parallel applications; performance; run-time overhead; small data streams; workstations network; Checkpointing; Costs; Fault tolerance; Fault tolerant systems; Hardware; Local area networks; Measurement; Operating systems; Runtime; Workstations;
fLanguage
English
Publisher
ieee
Conference_Titel
System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on
Conference_Location
Wailea, HI
Print_ISBN
0-8186-6930-6
Type
conf
DOI
10.1109/HICSS.1995.375504
Filename
375504
Link To Document