• DocumentCode
    2898501
  • Title

    Using message semantics for fast-output commit in checkpointing-and-rollback recovery

  • Author

    Silva, L.M. ; Silva, J.G.

  • Author_Institution
    Dept. de Engenharia Inf., Coimbra Univ., Portugal
  • Volume
    Track8
  • fYear
    1999
  • fDate
    5-8 Jan. 1999
  • Abstract
    Checkpointing is a very effective technique to ensure the continuity of long running applications in the occurrence of failures. However, one of the handicaps of coordinated checkpointing is the high latency for committing output from the application to the external world. Enhancing the checkpointing scheme, with a message logging protocol is a good solution to reduce the output latency. The idea is to track the sources of non-determinism in order to replay the application in a reproducible way during rollback recovery. We present a new event logging scheme that only logs those messages that may be delivered non deterministically to the application. While other schemes keep track of the arrival order of all the messages we just save the delivery order of some of them. Our scheme exploits the semantics of message passing and is able to reduce considerably the number of receiving events when compared with other existing schemes. We present some performance results that compare the output latency of coordinated checkpointing, pessimistic message logging, optimistic message logging and our event logging scheme.
  • Keywords
    message passing; program testing; software fault tolerance; system recovery; arrival order; checkpointing scheme; checkpointing-and-rollback recovery; coordinated checkpointing; delivery order; event logging scheme; failures; fast-output commit; long running applications; message logging protocol; message passing; message semantics; non determinism; optimistic message logging; output latency; pessimistic message logging; receiving events; Checkpointing; Data visualization; Humans; Printers; Protocols; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on
  • Conference_Location
    Maui, HI, USA
  • Print_ISBN
    0-7695-0001-3
  • Type

    conf

  • DOI
    10.1109/HICSS.1999.772986
  • Filename
    772986