• DocumentCode
    959471
  • Title

    An efficient protocol for checkpointing recovery in distributed systems

  • Author

    Kim, Junguk L. ; Park, Taesoon

  • Author_Institution
    Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
  • Volume
    4
  • Issue
    8
  • fYear
    1993
  • fDate
    8/1/1993 12:00:00 AM
  • Firstpage
    955
  • Lastpage
    960
  • Abstract
    The authors present an efficient synchronized checkpointing protocol that exploits the dependency relation between processes in distributed systems. In this protocol, a process takes a checkpoint when it knows that all processes on which it computationally depends took their checkpoints, hence the process need not always wait for the decision made by the checkpointing coordinator as in the conventional synchronized protocols. As a result, the checkpointing coordination time is substantially reduced and the possibility of total abort of the checkpointing coordination is reduced
  • Keywords
    distributed processing; protocols; synchronisation; system recovery; checkpointing coordinator; checkpointing recovery; dependency relation; distributed systems; synchronized checkpointing protocol; Checkpointing; Computer science; Delay effects; Distributed computing; Fault tolerant systems; Propagation delay; Protocols; Resumes;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.238629
  • Filename
    238629