• DocumentCode
    2601046
  • Title

    Lazy checkpoint coordination for bounding rollback propagation

  • Author

    Wang, Yi-Min ; Fuchs, W. Kent

  • Author_Institution
    Univ. of Illinois at Urbana-Champaign, IL, USA
  • fYear
    1993
  • fDate
    6-8 Oct 1993
  • Firstpage
    78
  • Lastpage
    85
  • Abstract
    The technique of lazy checkpoint coordination, which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation is proposed. The notion of laziness is introduced to control the coordination frequency and allow a flexible tradeoff between the cost of checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme
  • Keywords
    fault tolerant computing; parallel programming; system monitoring; system recovery; average rollback distance; checkpoint overhead; communication-induced checkpoint coordination; coordination frequency; lazy checkpoint coordination; parallel programs; process autonomy; rollback propagation; Checkpointing; Contracts; Costs; Frequency measurement; History; Laboratories; Message passing; NASA; Performance evaluation; Runtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 1993. Proceedings., 12th Symposium on
  • Conference_Location
    Princeton, NJ
  • Print_ISBN
    0-8186-4310-2
  • Type

    conf

  • DOI
    10.1109/RELDIS.1993.393471
  • Filename
    393471