DocumentCode
2611667
Title
Reducing message logging overhead for log-based recovery
Author
Wang, Yi-Min
Author_Institution
Coordinated Sci. Lab., Illinois,Univ., Urbana, IL, USA
fYear
1993
fDate
3-6 May 1993
Firstpage
1925
Abstract
Checkpointing and rollback recovery is essential for long-running parallel applications. In the case of a transient fault or system crash, the affected application programs can recover from a consistent set of checkpoints saved earlier instead of restarting from the very beginning. For applications requiring transparent fault tolerance, log-based recovery can usually achieve a better recoverable state at the cost of message logging in addition to checkpointing. A simple scheme for reducing message logging overhead based on local dependency information is presented. Communication trace-driven simulation for several parallel applications is used to evaluate the benefits of the proposed scheme for real applications
Keywords
fault tolerant computing; message passing; parallel processing; system recovery; checkpointing; communication trace-driven simulation; local dependency information; log-based recovery; long-running parallel applications; message logging overhead; recoverable state; rollback recovery; system crash; transient fault; transparent fault tolerance; Application software; Checkpointing; Computer crashes; Concurrent computing; Costs; Hardware; Protocols;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits and Systems, 1993., ISCAS '93, 1993 IEEE International Symposium on
Conference_Location
Chicago, IL
Print_ISBN
0-7803-1281-3
Type
conf
DOI
10.1109/ISCAS.1993.394126
Filename
394126
Link To Document