• DocumentCode
    3155548
  • Title

    The PTC scheme for designing loosely coupled recoverable processes: issues in realizing bounded recovery time

  • Author

    Kim, K.H.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., California Univ., Irvine, CA, USA
  • fYear
    1992
  • fDate
    14-16 Apr 1992
  • Firstpage
    287
  • Lastpage
    296
  • Abstract
    The technology for designing loosely coupled distributed computer systems (DCSs) required to tolerate propagated errors caused by software and/or hardware has remained in an immature state. This paper focuses on the type of DCS applications where a system is structured as a set of loosely coupled interacting processes distributed among multiple physical sites and each process is designed in the `partitioned design´ mode, i.e. designed with its interface specification only, rather than with full knowledge of interfaces between other processes (or sites). The thesis is that fault tolerance capabilities must be designed into loosely coupled processes without violating the design policy. The programmer-transparent coordination (PTC) scheme is one such approach that has been evolving since 1978. While the basic PTC scheme called the PTC/OR (PTC with obedient receiver) scheme is a scheme for facilitating various forms of cooperative backward recovery in systems of loosely coupled processes, it has one drawback: the difficulty of bounding worst-case recovery time. After discussing various possible solution approaches and their limitations, a promising approach called the PTC/SL (PTC with session leaders) scheme which superimposes additional rules on structuring process interactions onto those of the PTC/OR scheme, is presented. Under the PTC/SL scheme various flexible forms of process interactions are still allowed while the task of ensuring bounded recovery time is made a simple one. Several research issues related to the PTC/SL scheme, e.g., efficient implementation techniques, remain as subjects for future research
  • Keywords
    distributed processing; fault tolerant computing; system recovery; PTC/SL; bounded recovery time; cooperative backward recovery; fault tolerance; hardware; interface specification; loosely coupled recoverable processes; programmer-transparent coordination; propagated errors; session leaders; software; worst-case recovery time; Application software; Computer errors; Design engineering; Distributed control; Fault detection; Fault tolerance; Fault tolerant systems; Hardware; Process design; Wide area networks;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 1992., Proceedings of the Third Workshop on Future Trends of
  • Conference_Location
    Taipei
  • Print_ISBN
    0-8186-2755-7
  • Type

    conf

  • DOI
    10.1109/FTDCS.1992.217482
  • Filename
    217482