Title :
Consensus-based fault-tolerant total order multicast
Author :
Fritzke, Udo, Jr. ; Ingels, Philippe ; Mostefaoui, Achour ; Raynal, Michel
Author_Institution :
IRISA, Rennes, France
fDate :
2/1/2001 12:00:00 AM
Abstract :
While total order broadcast (or atomic broadcast) primitives have received a lot of attention, this paper concentrates on total order multicast to multiple groups in the context of asynchronous distributed systems in which processes may suffer crash failures. “Multicast to Multiple Groups” means that each message is sent to a subset of the process groups composing the system, distinct messages possibly having distinct destination groups. “Total Order” means that all message deliveries must be totally ordered. This paper investigates a consensus-based approach to solve this problem and proposes a corresponding protocol to implement this multicast primitive. This protocol is based on two underlying building blocks, namely, uniform reliable multicast and uniform consensus. Its design characteristics lie in the two following properties. The first one is a minimality property, more precisely, only the sender of a message and processes of its destination groups have to participate in the total order multicast of the message. The second property is a locality property: No execution of a consensus has to involve processes belonging to distinct groups (i.e., consensus is executed on a “per group” basis). This locality property is particularly useful when one is interested in using the total order multicast primitive in large-scale distributed systems. In addition to a correctness proof, an improvement that reduces the cost of the protocol is also suggested
Keywords :
fault tolerant computing; protocols; asynchronous distributed systems; atomic broadcast; consensus-based approach; correctness proof; large-scale distributed systems; locality property; minimality property; protocol; total order broadcast; uniform consensus; uniform reliable multicast; Broadcasting; Computer crashes; Costs; Detectors; Fault tolerance; Fault tolerant systems; Large-scale systems; Multicast protocols;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on