DocumentCode :
1649103
Title :
Extended recovery protocol in distributed systems
Author :
Higaki, Hiroaki ; Takizawa, Makoto
Author_Institution :
Dept. of Comput. & Syst. Eng., Tokyo Denki Univ., Saitama, Japan
fYear :
1998
Firstpage :
310
Lastpage :
315
Abstract :
This paper proposes a novel protocol for taking checkpoints and asynchronously restarting the processes for the recovery from the transient faults in asynchronous distributed systems. In the protocol, each process can be restarted asynchronously without the livelock. Each process can have multiple checkpoints to minimize the amount of computation wasted by the recovery. Moreover the garbage collection method is discussed. Each process has at most n checkpoints where n is the number of the processes. Only O(l) control messages are required to be transmitted where l is the number of communication channels in the system
Keywords :
distributed processing; protocols; system recovery; asynchronous distributed systems; checkpoints; communication channels; control messages; distributed systems; garbage collection; protocol; recovery protocol; Application software; Checkpointing; Computer crashes; Computer networks; Hardware; Large-scale systems; Protocols; Read only memory; System recovery; Systems engineering and theory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Networking, 1998. (ICOIN-12) Proceedings., Twelfth International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-8186-7225-0
Type :
conf
DOI :
10.1109/ICOIN.1998.648400
Filename :
648400
Link To Document :
بازگشت