• DocumentCode
    1398910
  • Title

    A case for two-level recovery schemes

  • Author

    Vaidya, Nitin H.

  • Author_Institution
    Dept. of Comput. Sci., Texas A&M Univ., College Station, TX, USA
  • Volume
    47
  • Issue
    6
  • fYear
    1998
  • fDate
    6/1/1998 12:00:00 AM
  • Firstpage
    656
  • Lastpage
    666
  • Abstract
    Long-running applications are often subject to failures. Failures can result in significant loss of computation, Therefore, it is necessary to use a failure recovery scheme to minimize performance overhead in the presence of failures. In this paper, we argue that it is often advantageous to use “two-level” recovery schemes. A two-level recovery scheme tolerates the more probable failures with low performance overhead, while the less probable failures may possibly incur a higher overhead. By minimizing overhead for the more frequently occurring failure scenarios, the two-level approach can achieve lower performance overhead (on average) as compared to existing recovery schemes. The paper describes two two-level recovery schemes. Performance analysis using a Markov chain shows that, in practice, a two-level scheme can perform better than its “one-level” counterpart. While the conclusions of this paper are intuitive, the work on design of appropriate recovery schemes is lacking. The objective of this paper is to motivate research into recovery schemes that can provide multiple levels of fault tolerance and achieve better performance than existing recovery schemes. The paper presents an analytical approach for evaluating performance of two-level schemes and shows that such schemes are hard to optimize analytically
  • Keywords
    Markov processes; fault tolerant computing; system recovery; Markov chain; failure recovery scheme; performance overhead; probable failures; two-level recovery schemes; Application software; Checkpointing; Computer aided software engineering; Databases; Failure analysis; Fault tolerance; Fault tolerant systems; Guidelines; Performance analysis; Protection;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.689645
  • Filename
    689645