DocumentCode :
1897399
Title :
Customizable Fault Tolerance forWide-Area Replication
Author :
Amir, Yair ; Coan, Brian ; Kirsch, Jonathan ; Lane, John
Author_Institution :
Johns Hopkins Univ., Baltimore
fYear :
2007
fDate :
10-12 Oct. 2007
Firstpage :
65
Lastpage :
82
Abstract :
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present a new, scalable replication architecture, built upon logical machines specifically designed to perform well in wide-area systems spanning multiple sites. The physical machines in each site implement a logical machine by running a local state machine replication protocol, and a wide-area replication protocol runs among the logical machines. Implementing logical machines via the state machine approach affords free substitution of the fault tolerance method used in each site and in the wide-area replication protocol, allowing one to balance performance and fault tolerance based on perceived risk. We present a new byzantine fault-tolerant protocol that establishes a reliable virtual communication link between logical machines. Our communication protocol is efficient (a necessity in wide-area environments), avoiding the need for redundant message sending during normal-case operation and allowing a logical machine to consume approximately the same wide-area bandwidth as a single physical machine. This dramatically improves the wide-area performance of our system compared to existing logical machine based approaches. We implemented a prototype system and compare its performance and fault tolerance to existing solutions.
Keywords :
fault tolerant computing; replicated databases; transport protocols; wide area networks; byzantine fault-tolerant protocol; communication protocol; customizable fault tolerance; distributed system; local state machine replication protocol; logical machine; reliable virtual communication link; scalable replication architecture; wide-area replication protocol; wide-area system; Bandwidth; Fault tolerance; Fault tolerant systems; Information systems; Large-scale systems; Local area networks; Network servers; Protocols; Prototypes; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems, 2007. SRDS 2007. 26th IEEE International Symposium on
Conference_Location :
Beijing
ISSN :
1060-9857
Print_ISBN :
0-7695-2995-X
Type :
conf
DOI :
10.1109/SRDS.2007.40
Filename :
4365685
Link To Document :
بازگشت