DocumentCode
3556947
Title
Crash recovery with little overhead
Author
Juang, Tony T -Y ; Venkatesan, S.
Author_Institution
Comput. Sci. Program, Texas Univ at Dallas, Richardson, TX, USA
fYear
1991
fDate
20-24 May 1991
Firstpage
454
Lastpage
461
Abstract
Recovering from processor failures in distributed systems is an important problem in the design and development of reliable systems. Two solutions to this problem which involve very little overhead are presented. Without appending any information to the messages of the application program, it is shown that it is possible to recover from failures using O(|V| |E|) messages where |V| is the number of processors and |E| is the number of communication links in the system. The second algorithm can be used to recover from processor failures without forcing nonfaulty processors to roll back under certain conditions
Keywords
fault tolerant computing; file organisation; operating systems (computers); system recovery; application program; communication links; crash recovery; distributed systems; nonfaulty processors; processor failures; reliable systems; Checkpointing; Computer crashes; Computer science; Delay; Fault tolerant systems; Hardware; History; IEL; Protocols;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing Systems, 1991., 11th International Conference on
Conference_Location
Arlington, TX
Print_ISBN
0-8186-2144-3
Type
conf
DOI
10.1109/ICDCS.1991.148709
Filename
148709
Link To Document