DocumentCode
1941872
Title
Units of computation in fault-tolerant distributed systems
Author
Ahuja, Mohan ; Mishra, Shivakant
Author_Institution
Dept. of Comput. Sci. & Eng., California Univ., San Diego, La Jolla, CA, USA
fYear
1994
fDate
21-24 Jun 1994
Firstpage
626
Lastpage
633
Abstract
We develop a framework that helps in developing understanding of a fault-tolerant distributed system and so helps in designing such systems. We define a unit of computation in such systems, referred to as a molecule, that has a well defined interface with other molecules, i.e. has minimal dependence on other molecules. The smallest such unit-an indivisible molecule-is termed as an atom. We show that any execution of a fault-tolerant distributed computation can be seen as an execution of molecules/atoms in a partial order, and such a view provides insights into understanding the computation, particularly for a fault tolerant system where it is important to guarantee that a unit of computation is either completely executed or not at all and system designers need to reason about the states after execution of such units. We prove different properties satisfied by molecules and atoms, and present algorithms to detect atoms in an ongoing computation and to force the completion of a molecule. We illustrate the uses of the developed work in application areas such as debugging, checkpointing, and reasoning about stable properties
Keywords
distributed algorithms; distributed processing; fault tolerant computing; program debugging; reliability; atom; checkpointing; debugging; fault-tolerant distributed systems; indivisible molecule; molecule; ongoing computation; partial order; reasoning; stable properties; units of computation; Checkpointing; Computer interfaces; Computer science; Debugging; Design engineering; Distributed computing; Fault tolerant systems; Modems; Sun;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing Systems, 1994., Proceedings of the 14th International Conference on
Conference_Location
Pozman
Print_ISBN
0-8186-5840-1
Type
conf
DOI
10.1109/ICDCS.1994.302480
Filename
302480
Link To Document