Title :
Atomic broadcast in asynchronous crash-recovery distributed systems and its use in quorum-based replication
Author :
Rodrigues, Luís ; Raynal, Michel
Author_Institution :
Faculdade de Ciencias, Univ. de Lisboa, Portugal
Abstract :
Atomic broadcast is a fundamental problem of distributed systems: It states that messages must be delivered in the same order to their destination processes. This paper describes a solution to this problem in asynchronous distributed systems in which processes can crash and recover. A consensus-based solution to atomic broadcast problem has been designed by Chandra and Toueg for asynchronous distributed systems where crashed processes do not recover. We extend this approach: it transforms any consensus protocol suited to the crash-recovery model into an atomic broadcast protocol suited to the same model. We show that atomic broadcast can be implemented requiring few additional log operations in excess of those required by the consensus. The paper also discusses how additional log operations can improve the protocol in terms of faster recovery and better throughput. To illustrate the use of the protocol, the paper also describes a solution to the replica management problem in asynchronous distributed systems in which processes can crash and recover. The proposed technique makes a bridge between established results on weighted voting and recent results on the consensus problem.
Keywords :
broadcasting; message passing; protocols; software fault tolerance; system recovery; asynchronous crash-recovery distributed systems; atomic broadcast; atomic broadcast protocol; consensus protocol; consensus-based solution; crash-recovery model; log operations; message delivery; quorum-based replication; replica management problem; Bridges; Broadcasting; Computer crashes; Detectors; Fault tolerance; Fault tolerant systems; Protocols; Throughput; Voting;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2003.1232273