Title :
Selecting a “primary partition” in partitionable asynchronous distributed systems
Author :
Bartoli, Alberto ; Babaoglu, Ozalp
Author_Institution :
Dipt. Ingegneria dell Inf., Pisa Univ., Italy
Abstract :
We consider network applications that are based on the process group paradigm. When such applications are deployed over networks that are subject to failures, they may partition across several disconnected clusters resulting in multiple views of the group´s current composition to exist concurrently. Application semantics determine which operations, if any, can be performed in different partitions without compromising consistency. For certain application classes, most (possibly all) operations need to be confined to a single primary partition while other partitions are allowed to service only a (possibly empty) subset of the operations. We propose a mechanism for deciding when a view constitutes the primary partition for the group. Our solution is highly flexible and has the following novel features: each group member can establish if it belongs to the primary partition or not, based solely on local information; the group can be dynamic as processes voluntarily join and leave it; the selection rule for establishing the primary partition need not be universal but can be decided on a per-application basis and can be modified at run time; the primary partition can be re-established even after total failures. Layering our solution on top of a partitionable group membership service allows a wide range of applications with different and possibly conflicting notions of “primary partition” to be supported on a common computing base
Keywords :
computer network management; data integrity; fault tolerant computing; performance evaluation; reliability; application classes; application semantics; common computing base; disconnected clusters; group member; local information; multiple views; network applications; partitionable asynchronous distributed systems; partitionable group membership service; primary partition; process group paradigm; selection rule; single primary partition; total failures; Computer crashes; Computer science; Electronic mail; Intelligent networks; Partitioning algorithms; Runtime;
Conference_Titel :
Reliable Distributed Systems, 1997. Proceedings., The Sixteenth Symposium on
Conference_Location :
Durham, NC
Print_ISBN :
0-8186-8177-2
DOI :
10.1109/RELDIS.1997.632809