• DocumentCode
    2379348
  • Title

    On staggered checkpointing

  • Author

    Vaidya, Nitin H.

  • Author_Institution
    Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
  • fYear
    1996
  • fDate
    23-26 Oct 1996
  • Firstpage
    572
  • Lastpage
    580
  • Abstract
    A consistent checkpointing algorithm serves a consistent view of a distributed application´s state on stable storage. The traditional consistent checkpointing algorithms require different processes to save their state at about the same time. This causes contention for the stable storage, potentially resulting in large overheads. Staggering the checkpoints taken by various processes can reduce checkpoint overhead. The paper presents a simple approach to arbitrarily stagger the checkpoints. The approach requires that the processes take consistent logical checkpoints, as compared to consistent physical checkpoints enforced by existing algorithms. Experimental results on nCube-2 are presented
  • Keywords
    distributed algorithms; distributed memory systems; fault tolerant computing; hypercube networks; reliability; system recovery; checkpoint overhead reduction; consistent checkpointing algorithm; consistent logical checkpoints; distributed application state; nCube-2; stable storage; staggered checkpointing; Checkpointing; Communication system control; Computer science; Degradation; Delay; Frequency; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing, 1996., Eighth IEEE Symposium on
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    0-8186-7683-3
  • Type

    conf

  • DOI
    10.1109/SPDP.1996.570386
  • Filename
    570386