DocumentCode
1284531
Title
Independent recovery in large-scale distributed systems
Author
Triantafiliou, P.
Author_Institution
Dept. of Comput. Eng., Tech. Univ. of Crete, Chania
Volume
22
Issue
11
fYear
1996
fDate
11/1/1996 12:00:00 AM
Firstpage
812
Lastpage
826
Abstract
In large systems, replication can become important means to improve data access times and availability. Existing recovery protocols, on the other hand, were proposed for small-scale distributed systems. Such protocols typically update stale, newly-recovered sites with replicated data and resolve the commit uncertainty of recovering sites. Thus, given that in large systems failures are more frequent and that data access times are costlier, such protocols can potentially introduce large overheads in large systems and must be avoided, if possible. We call these protocols dependent recovery protocols since they require a recovering site to consult with other sites. Independent recovery has been studied in the context of one-copy systems and has been proven unattainable. This paper offers independent recovery protocols for large-scale systems with replicated data. It shows how the protocols can be incorporated into several well-known replication protocols and proves that these protocols continue to ensure data consistency. The paper then addresses the issue of nonblocking atomic commitment. It presents mechanisms which can reduce the overhead of termination protocols and the probability of blocking. Finally, the performance impact of the proposed recovery protocols is studied through the use of simulation and analytical studies. The results of these studies show that the significant benefits of independent recovery can be enjoyed with a very small loss in data availability and a very small increase in the number of transaction abortions
Keywords
concurrency control; data integrity; distributed databases; protocols; replicated databases; software fault tolerance; software performance evaluation; system recovery; transaction processing; commit uncertainty; concurrency control; data access times; data availability; data consistency; independent recovery; large overheads; large-scale distributed systems; nonblocking atomic commitment; one-copy systems; performance impact; probability; recovery protocols; replicated database; replication protocols; simulation; termination protocols; transaction abortions; Abortion; Access protocols; Costs; Data engineering; Delay effects; Electronic mail; Large-scale systems; Terminology;
fLanguage
English
Journal_Title
Software Engineering, IEEE Transactions on
Publisher
ieee
ISSN
0098-5589
Type
jour
DOI
10.1109/32.553700
Filename
553700
Link To Document