• DocumentCode
    1147999
  • Title

    ickp: a consistent checkpointer for multicomputers

  • Author

    Plank, James S. ; Li, Kai

  • Author_Institution
    Tennessee Univ., Knoxville, TN, USA
  • Volume
    2
  • Issue
    2
  • fYear
    1994
  • Firstpage
    62
  • Lastpage
    67
  • Abstract
    There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.<>
  • Keywords
    fault tolerant computing; message passing; parallel processing; program diagnostics; software reliability; system recovery; Intel iPSC/860; checkpoint time; checkpointing algorithms; checkpointing library; consistent checkpointer; distributed systems; general-purpose checkpointer; host processor; ickp; library call; multicomputers; optimizations; overhead; parallel systems; periodic interval; recovery; Automatic control; Checkpointing; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; File systems; Libraries; Parallel processing; Registers;
  • fLanguage
    English
  • Journal_Title
    Parallel & Distributed Technology: Systems & Applications, IEEE
  • Publisher
    ieee
  • ISSN
    1063-6552
  • Type

    jour

  • DOI
    10.1109/88.311574
  • Filename
    311574