• DocumentCode
    1284531
  • Title

    Independent recovery in large-scale distributed systems

  • Author

    Triantafiliou, P.

  • Author_Institution
    Dept. of Comput. Eng., Tech. Univ. of Crete, Chania
  • Volume
    22
  • Issue
    11
  • fYear
    1996
  • fDate
    11/1/1996 12:00:00 AM
  • Firstpage
    812
  • Lastpage
    826
  • Abstract
    In large systems, replication can become important means to improve data access times and availability. Existing recovery protocols, on the other hand, were proposed for small-scale distributed systems. Such protocols typically update stale, newly-recovered sites with replicated data and resolve the commit uncertainty of recovering sites. Thus, given that in large systems failures are more frequent and that data access times are costlier, such protocols can potentially introduce large overheads in large systems and must be avoided, if possible. We call these protocols dependent recovery protocols since they require a recovering site to consult with other sites. Independent recovery has been studied in the context of one-copy systems and has been proven unattainable. This paper offers independent recovery protocols for large-scale systems with replicated data. It shows how the protocols can be incorporated into several well-known replication protocols and proves that these protocols continue to ensure data consistency. The paper then addresses the issue of nonblocking atomic commitment. It presents mechanisms which can reduce the overhead of termination protocols and the probability of blocking. Finally, the performance impact of the proposed recovery protocols is studied through the use of simulation and analytical studies. The results of these studies show that the significant benefits of independent recovery can be enjoyed with a very small loss in data availability and a very small increase in the number of transaction abortions
  • Keywords
    concurrency control; data integrity; distributed databases; protocols; replicated databases; software fault tolerance; software performance evaluation; system recovery; transaction processing; commit uncertainty; concurrency control; data access times; data availability; data consistency; independent recovery; large overheads; large-scale distributed systems; nonblocking atomic commitment; one-copy systems; performance impact; probability; recovery protocols; replicated database; replication protocols; simulation; termination protocols; transaction abortions; Abortion; Access protocols; Costs; Data engineering; Delay effects; Electronic mail; Large-scale systems; Terminology;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/32.553700
  • Filename
    553700