• DocumentCode
    2747969
  • Title

    The performance of coordinated and independent checkpointing

  • Author

    Silva, Luis Moura ; Silva, João Gabriel

  • Author_Institution
    Dept. de Engenharia Inf., Coimbra Univ., Portugal
  • fYear
    1999
  • fDate
    12-16 Apr 1999
  • Firstpage
    280
  • Lastpage
    284
  • Abstract
    Checkpointing is a very effective technique to tolerate the occurrence of failures in distributed and parallel applications. The existing algorithms in the literature are basically divided into two main classes: coordinated and independent checkpointing. This paper presents an experimental study that compares the performance of these two classes of algorithms. The main conclusion of our study is that coordinated checkpointing is more efficient than independent checkpointing and all the arguments against the performance of coordinated algorithms were not verified in practice
  • Keywords
    fault tolerant computing; performance evaluation; system recovery; checkpointing; coordinated checkpointing; distributed; fault tolerance; independent checkpointing; parallel; Bandwidth; Checkpointing; Electrical capacitance tomography; Parallel machines; Protocols; Runtime library; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 1999. 13th International and 10th Symposium on Parallel and Distributed Processing, 1999. 1999 IPPS/SPDP. Proceedings
  • Conference_Location
    San Juan
  • Print_ISBN
    0-7695-0143-5
  • Type

    conf

  • DOI
    10.1109/IPPS.1999.760487
  • Filename
    760487