• DocumentCode
    3270703
  • Title

    Fault tolerant distributed computing using atomic send-receive checkpoints

  • Author

    Wójcik, Zbigniew M. ; Wójcik, Barbara E.

  • Author_Institution
    Div. of Math., Comput. Sci. & Stat., Texas Univ., San Antonio, TX, USA
  • fYear
    1990
  • fDate
    9-13 Dec 1990
  • Firstpage
    215
  • Lastpage
    222
  • Abstract
    The paper presents a deadlock free fault recovery algorithm for an entirely distributed system in which the messages do not need to arrive in the order they have been sent. The method is based on the asynchronous, atomic checkpointing of the sender and receiver of a message. Messages not balanced in the last permanent checkpoints are recorded in the new checkpoints. The fault recovery is based on: (a) repetition of all messages lost according to a record of unbalanced messages in the last permanent checkpoints, and on (b) undoing every message re-sent during the fault recovery, or undoing of a computation repeated according to a record of unbalanced messages in the last permanent checkpoints. A fault recovery involves only processes which communicated before a failure. A distributed computation may be split into a few segments without affecting transaction consistency. The algorithm involves the minimum number of messages. Proof of the resilience of the fault recovery algorithm is presented
  • Keywords
    distributed processing; fault tolerant computing; system recovery; transaction processing; asynchronous messages; atomic checkpointing; atomic send-receive checkpoints; checkpoint consistency; deadlock free fault recovery algorithm; last permanent checkpoints; transaction consistency; unbalanced messages; Checkpointing; Computer science; Distributed computing; Error correction; Fault detection; Fault tolerance; Mathematics; Resilience; Statistical distributions; System recovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing, 1990. Proceedings of the Second IEEE Symposium on
  • Conference_Location
    Dallas, TX
  • Print_ISBN
    0-8186-2087-0
  • Type

    conf

  • DOI
    10.1109/SPDP.1990.143536
  • Filename
    143536