DocumentCode
3155548
Title
The PTC scheme for designing loosely coupled recoverable processes: issues in realizing bounded recovery time
Author
Kim, K.H.
Author_Institution
Dept. of Electr. & Comput. Eng., California Univ., Irvine, CA, USA
fYear
1992
fDate
14-16 Apr 1992
Firstpage
287
Lastpage
296
Abstract
The technology for designing loosely coupled distributed computer systems (DCSs) required to tolerate propagated errors caused by software and/or hardware has remained in an immature state. This paper focuses on the type of DCS applications where a system is structured as a set of loosely coupled interacting processes distributed among multiple physical sites and each process is designed in the `partitioned design´ mode, i.e. designed with its interface specification only, rather than with full knowledge of interfaces between other processes (or sites). The thesis is that fault tolerance capabilities must be designed into loosely coupled processes without violating the design policy. The programmer-transparent coordination (PTC) scheme is one such approach that has been evolving since 1978. While the basic PTC scheme called the PTC/OR (PTC with obedient receiver) scheme is a scheme for facilitating various forms of cooperative backward recovery in systems of loosely coupled processes, it has one drawback: the difficulty of bounding worst-case recovery time. After discussing various possible solution approaches and their limitations, a promising approach called the PTC/SL (PTC with session leaders) scheme which superimposes additional rules on structuring process interactions onto those of the PTC/OR scheme, is presented. Under the PTC/SL scheme various flexible forms of process interactions are still allowed while the task of ensuring bounded recovery time is made a simple one. Several research issues related to the PTC/SL scheme, e.g., efficient implementation techniques, remain as subjects for future research
Keywords
distributed processing; fault tolerant computing; system recovery; PTC/SL; bounded recovery time; cooperative backward recovery; fault tolerance; hardware; interface specification; loosely coupled recoverable processes; programmer-transparent coordination; propagated errors; session leaders; software; worst-case recovery time; Application software; Computer errors; Design engineering; Distributed control; Fault detection; Fault tolerance; Fault tolerant systems; Hardware; Process design; Wide area networks;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing Systems, 1992., Proceedings of the Third Workshop on Future Trends of
Conference_Location
Taipei
Print_ISBN
0-8186-2755-7
Type
conf
DOI
10.1109/FTDCS.1992.217482
Filename
217482
Link To Document