• DocumentCode
    1446320
  • Title

    Theoretical analysis for communication-induced checkpointing protocols with rollback-dependency trackability

  • Author

    Tsai, Jichiang ; Kuo, Sy-Yen ; Wang, Yi-Min

  • Author_Institution
    Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    9
  • Issue
    10
  • fYear
    1998
  • fDate
    10/1/1998 12:00:00 AM
  • Firstpage
    963
  • Lastpage
    971
  • Abstract
    Rollback-Dependency Trackability (RDT) is a property that states that all rollback dependencies between local checkpoints are on-line trackable by using a transitive dependency vector. In this paper, we address three fundamental issues in the design of communication-induced checkpointing protocols that ensure RDT. First, we prove that the following intuition commonly assumed in the literature is in fact false: If a protocol forces a checkpoint only at a stronger condition, then it must take, at most, as many forced checkpoints as a protocol based on a weaker condition. This result implies that the common approach of sharpening the checkpoint-inducing condition by piggybacking more control information on each message may not always yield a more efficient protocol. Next, we prove that there is no optimal on-line RDT protocol that takes fewer forced checkpoints than any other RDT protocol for all possible communication patterns. Finally, since comparing checkpoint-inducing conditions is not sufficient for comparing protocol performance, we present some formal techniques for comparing the performance of several existing RDT protocols
  • Keywords
    distributed processing; protocols; software fault tolerance; system recovery; communication patterns; communication-induced checkpointing protocols; local checkpoints; protocol performance; rollback-dependency trackability; transitive dependency vector; Checkpointing; Communication networks; Communication system control; Computer networks; Distributed computing; Force control; Nonvolatile memory; Process control; Protocols; Sufficient conditions;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/71.730526
  • Filename
    730526