DocumentCode
3042103
Title
Progressive retry for software error recovery in distributed systems
Author
Wang, Yi-Min ; Huang, Yennun ; Fuchs, Kent W.
Author_Institution
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
fYear
1993
fDate
22-24 June 1993
Firstpage
138
Lastpage
144
Abstract
A method of execution retry for bypassing software faults based on checkpointing, rollback, message reordering, and replaying is described. The authors demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software errors by exploiting message reordering to bypass software faults. The approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from experience with telecommunications software systems illustrate the benefits of the scheme.
Keywords
software fault tolerance; checkpointing; distributed systems; execution retry; message reordering; nondeterminism; progressive retry; rollback; software error recovery; telecommunications software systems; transient hardware failure recovery; Checkpointing; Computer errors; Contracts; Costs; Hardware; NASA; Protocols; Runtime; Software systems; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Fault-Tolerant Computing, 1993. FTCS-23. Digest of Papers., The Twenty-Third International Symposium on
Conference_Location
Toulouse, France
ISSN
0731-3071
Print_ISBN
0-8186-3680-7
Type
conf
DOI
10.1109/FTCS.1993.627317
Filename
627317
Link To Document