Title :
Fully distributed three-tier active software replication
Author :
Marchetti, Carlo ; Baldoni, Roberto ; Tucci-Piergiovanni, Sara ; Virgillito, Antonino
Author_Institution :
Dipt. di Informatica e Sistemistica, Universita degli Studi di Roma "La Sapienza"
fDate :
7/1/2006 12:00:00 AM
Abstract :
Keeping strongly consistent the state of the replicas of a software service deployed across a distributed system prone to crashes and with highly unstable message transfer delays (e.g., the Internet), is a real practical challenge. The solution to this problem is subject to the FLP impossibility result, and thus there is a need for "long enough" periods of synchrony with time bounds on process speeds and message transfer delays to ensure deterministic termination of any run of agreement protocols executed by replicas. This behavior can be abstracted by a partially synchronous computational model. In this setting, before reaching a period of synchrony, the underlying network can arbitrarily delay messages and these delays can be perceived as false failures by some timeout-based failure detection mechanism leading to unexpected service unavailability. This paper proposes a fully distributed solution for active software replication based on a three-tier software architecture well-suited to such a difficult setting. The formal correctness of the solution is proved by assuming the middle-tier runs in a partially synchronous distributed system. This architecture separates the ordering of the requests coming from clients, executed by the middle-tier, from their actual execution, done by replicas, i.e., the end-tier. In this way, clients can show up in any part of the distributed system and replica placement is simplified, since only the middle-tier has to be deployed on a well-behaving part of the distributed system that frequently respects synchrony bounds. This deployment permits a rapid timeout tuning reducing thus unexpected service unavailability
Keywords :
fault diagnosis; fault tolerant computing; formal verification; message passing; software architecture; active software replication; message transfer delays; software service; synchronous computational model; synchronous distributed system; three-tier software architecture; timeout-based failure detection mechanism; Computational modeling; Computer architecture; Computer crashes; Delay effects; Distributed computing; Protocols; Software architecture; Software systems; Timing; Web and internet services; Dependable distributed systems; architectures for dependable services.; replication protocols; software replication in wide-area networks;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2006.89