• DocumentCode
    1941872
  • Title

    Units of computation in fault-tolerant distributed systems

  • Author

    Ahuja, Mohan ; Mishra, Shivakant

  • Author_Institution
    Dept. of Comput. Sci. & Eng., California Univ., San Diego, La Jolla, CA, USA
  • fYear
    1994
  • fDate
    21-24 Jun 1994
  • Firstpage
    626
  • Lastpage
    633
  • Abstract
    We develop a framework that helps in developing understanding of a fault-tolerant distributed system and so helps in designing such systems. We define a unit of computation in such systems, referred to as a molecule, that has a well defined interface with other molecules, i.e. has minimal dependence on other molecules. The smallest such unit-an indivisible molecule-is termed as an atom. We show that any execution of a fault-tolerant distributed computation can be seen as an execution of molecules/atoms in a partial order, and such a view provides insights into understanding the computation, particularly for a fault tolerant system where it is important to guarantee that a unit of computation is either completely executed or not at all and system designers need to reason about the states after execution of such units. We prove different properties satisfied by molecules and atoms, and present algorithms to detect atoms in an ongoing computation and to force the completion of a molecule. We illustrate the uses of the developed work in application areas such as debugging, checkpointing, and reasoning about stable properties
  • Keywords
    distributed algorithms; distributed processing; fault tolerant computing; program debugging; reliability; atom; checkpointing; debugging; fault-tolerant distributed systems; indivisible molecule; molecule; ongoing computation; partial order; reasoning; stable properties; units of computation; Checkpointing; Computer interfaces; Computer science; Debugging; Design engineering; Distributed computing; Fault tolerant systems; Modems; Sun;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 1994., Proceedings of the 14th International Conference on
  • Conference_Location
    Pozman
  • Print_ISBN
    0-8186-5840-1
  • Type

    conf

  • DOI
    10.1109/ICDCS.1994.302480
  • Filename
    302480