DocumentCode :
1812930
Title :
An optimal atomic broadcast protocol and an implementation framework
Author :
Ezhilchelvan, Paul ; Palmer, Doug ; Raynal, Michel
Author_Institution :
Dept. of Comput. Sci., Newcastle Univ., NSW, Australia
fYear :
2003
fDate :
15-17 Jan. 2003
Firstpage :
32
Lastpage :
39
Abstract :
Atomic broadcast (where all processes deliver broadcast messages in the same order) is a very useful group communication primitive for building fault-tolerant distributed systems. This paper presents an atomic broadcast protocol that can be claimed to be optimal in terms of failure detection, resilience, and latency. The protocol requires only the weakest of the useful failure detectors for liveness, and permits up to (n-1)/2 processes to crash in a system of n processes; at most two communication steps and n broadcasts are needed in a run during which process crashes and failure-suspicions do not occur. We also introduce the notion of notifying broadcast which can reduce the message overhead further in ´nice´ runs in which all processes are operational and communication delays do not exceed the bound assumed. If nice runs persist, the average message overhead is just one broadcast. That is, the protocol extracts no message overhead for providing crash-tolerance if process failures and unanticipated fluctuations in communication delays do not occur. We are currently implementing our protocol as a CORBA (Common Object Request Broker Architecture) component. All known ORBs use IIOP as the standard protocol for inter-process communication, which in turn uses TCP/IP (transmission control protocol/Internet protocol) as the common transport protocol. It turns out that the notifying broadcast is easy to implement on top of the TCP transport layer.
Keywords :
distributed object management; software fault tolerance; transport protocols; CORBA; Common Object Request Broker Architecture; IIOP; TCP/IP; asynchronous distributed system; atomic broadcast protocol; communication delay; consensus; crash failure; crash-tolerance; failure-suspicion; fault-tolerant system; inter-process communication; latency; liveness; message overhead reduction; middleware; notifying broadcast; operational delay; optimal failure detection; resilience; system crash; transmission control protocol/Internet protocol; Broadcasting; Communication standards; Computer crashes; Delay; Detectors; Fault tolerant systems; Fluctuations; Resilience; TCPIP; Transport protocols;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Object-Oriented Real-Time Dependable Systems, 2003. (WORDS 2003). Proceedings of the Eighth International Workshop on
Print_ISBN :
0-7695-1929-6
Type :
conf
DOI :
10.1109/WORDS.2003.1218063
Filename :
1218063
Link To Document :
بازگشت