• DocumentCode
    3042103
  • Title

    Progressive retry for software error recovery in distributed systems

  • Author

    Wang, Yi-Min ; Huang, Yennun ; Fuchs, Kent W.

  • Author_Institution
    Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
  • fYear
    1993
  • fDate
    22-24 June 1993
  • Firstpage
    138
  • Lastpage
    144
  • Abstract
    A method of execution retry for bypassing software faults based on checkpointing, rollback, message reordering, and replaying is described. The authors demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software errors by exploiting message reordering to bypass software faults. The approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from experience with telecommunications software systems illustrate the benefits of the scheme.
  • Keywords
    software fault tolerance; checkpointing; distributed systems; execution retry; message reordering; nondeterminism; progressive retry; rollback; software error recovery; telecommunications software systems; transient hardware failure recovery; Checkpointing; Computer errors; Contracts; Costs; Hardware; NASA; Protocols; Runtime; Software systems; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on
  • Conference_Location
    Toulouse, France
  • ISSN
    0731-3071
  • Print_ISBN
    0-8186-3680-7
  • Type

    conf

  • DOI
    10.1109/FTCS.1993.627317
  • Filename
    627317