DocumentCode
1147999
Title
ickp: a consistent checkpointer for multicomputers
Author
Plank, James S. ; Li, Kai
Author_Institution
Tennessee Univ., Knoxville, TN, USA
Volume
2
Issue
2
fYear
1994
Firstpage
62
Lastpage
67
Abstract
There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.<>
Keywords
fault tolerant computing; message passing; parallel processing; program diagnostics; software reliability; system recovery; Intel iPSC/860; checkpoint time; checkpointing algorithms; checkpointing library; consistent checkpointer; distributed systems; general-purpose checkpointer; host processor; ickp; library call; multicomputers; optimizations; overhead; parallel systems; periodic interval; recovery; Automatic control; Checkpointing; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; File systems; Libraries; Parallel processing; Registers;
fLanguage
English
Journal_Title
Parallel & Distributed Technology: Systems & Applications, IEEE
Publisher
ieee
ISSN
1063-6552
Type
jour
DOI
10.1109/88.311574
Filename
311574
Link To Document