• DocumentCode
    2589573
  • Title

    On the optimum recovery of distributed programs

  • Author

    Silva, Luis Moura ; Silva, João Gabriel

  • Author_Institution
    Coimbra Univ., Portugal
  • fYear
    1994
  • fDate
    5-8 Sep 1994
  • Firstpage
    704
  • Lastpage
    711
  • Abstract
    In a previous paper (1992) the authors have presented a checkpointing algorithm for distributed applications. That algorithm is based on a non-blocking coordinated global checkpoint of the distributed program. However, the associated rollback algorithm does not provide the best results, since in most cases it forces the rollback of all the processes. This paper presents two algorithms for roll-back-recovery that minimize the number of processes which need to roll back. One of the algorithms is oriented to systems that use message logging, while the other is more general and can be used in those systems that only rely an a coordinated checkpoint and do not log messages. We will show that our proposal achieves the optimum results In the minimization of the number of processes that have to roll back
  • Keywords
    parallel algorithms; parallel programming; software fault tolerance; system recovery; checkpointing; distributed algorithms; distributed programs; distributed systems; fault-tolerance; message logging; optimum recovery; roll-back-recovery; vector time; Checkpointing; Computer applications; Computer crashes; Computer hacking; Concurrent computing; Delay; Distributed algorithms; Distributed computing; Fault tolerance; Proposals;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    EUROMICRO 94. System Architecture and Integration. Proceedings of the 20th EUROMICRO Conference.
  • Conference_Location
    Liverpool
  • Print_ISBN
    0-8186-6430-4
  • Type

    conf

  • DOI
    10.1109/EURMIC.1994.390340
  • Filename
    390340