Title :
Progressive retry for software failure recovery in message-passing applications
Author :
Yi-Min Wang ; Yennun Huang ; Kintala, C.
Author_Institution :
ATST Labs-Res., NJ.
fDate :
10/1/1997 12:00:00 AM
Abstract :
A method of execution retry for bypassing software faults in message-passing applications is described in this paper. Based on the techniques of checkpointing and message logging, we demonstrate the use of message replaying and message reordering as two mechanisms for achieving localized and fast recovery. The approach gradually increases the rollback distance and the number of affected processes when a previous retry fails, and is therefore named progressive retry. Examples from telecommunications software systems and performance measurements from an application-level implementation are described to illustrate the benefits of the scheme
Keywords :
computational complexity; message passing; software fault tolerance; sorting; system recovery; application-level implementation; checkpointing; execution retry; message logging; message reordering; message-passing applications; performance measurements; progressive retry; rollback distance; software failure recovery; software faults; telecommunications software systems; Application software; Availability; Checkpointing; Computer Society; Distributed computing; Measurement; Protocols; Software debugging; Software systems; Telecommunication computing;
Journal_Title :
Computers, IEEE Transactions on