• DocumentCode
    1202906
  • Title

    The guardian model and primitives for exception handling in distributed systems

  • Author

    Miller, Robert ; Tripathi, Anand

  • Author_Institution
    IBM Corp., Rochester, MN, USA
  • Volume
    30
  • Issue
    12
  • fYear
    2004
  • Firstpage
    1008
  • Lastpage
    1022
  • Abstract
    This work presents an abstraction called guardian for exception handling in distributed and concurrent systems that use coordinated exception handling. This model addresses two fundamental problems with distributed exception handling in a group of asynchronous processes. The first is to perform recovery when multiple exceptions are concurrently signaled. The second is to determine the correct context in which a process should execute its exception handling actions. Several schemes have been proposed in the past to address these problems. These are based on structuring a distributed program as atomic actions based on conversations or transactions and resolving multiple concurrent exceptions into a single one. The guardian in a distributed program represents the abstraction of a global exception handler, which encapsulates rules for handling concurrent exceptions and directing each process to the semantically correct context for executing its recovery actions. Its programming primitives and the underlying distributed execution model are presented here. In contrast to the existing approaches, this model is more basic and can be used to implement or enhance the existing schemes. Using several examples we illustrate the capabilities of this model. Finally, its advantages and limitations are discussed in contrast to existing approaches.
  • Keywords
    concurrency control; exception handling; parallel programming; software fault tolerance; system recovery; concurrent programming; distributed execution model; distributed programming; exception handling; fault tolerance; system recovery; Application software; Computer errors; Error correction; Fault detection; Fault tolerance; Fault tolerant systems; Process control; Signal resolution; Software systems; Testing;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2004.106
  • Filename
    1377194