• DocumentCode
    2356121
  • Title

    Policy-driven fault management in distributed systems

  • Author

    Katchabaw, Michael J. ; Lutfiyya, Hanan L. ; Marshall, Andrew D. ; Bauer, Michael A.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Western Ontario, London, Ont., Canada
  • fYear
    1996
  • fDate
    30 Oct-2 Nov 1996
  • Firstpage
    236
  • Lastpage
    245
  • Abstract
    Management policies can be used to specify requirements about the desired behaviour of distributed systems. Violations of policies (faults) can then be detected, isolated, located and corrected using a policy-driven fault management system. Other work in this area to date has focused on network-level faults. We believe that in a distributed system it is more appropriate to focus on faults at the application level. Furthermore, this work has been largely domain-specific-a generic, structured approach to this problem is needed. Our work has focused on policy-driven fault management in distributed systems at the application level. In this paper, we define a generic architecture for policy-driven fault management and present a prototype system based on this architecture. We also discuss experience to date using and experimenting with our prototype system
  • Keywords
    distributed processing; fault diagnosis; fault location; formal specification; software management; software reliability; system recovery; OSI management framework; application-level faults; distributed applications management; distributed computing environments; distributed systems; fault correction; fault detection; fault isolation; fault location; generic architecture; generic structured approach; policy violations; policy-driven fault management; prototype system; requirements specification; Communication networks; Computer network management; Computer networks; Computer science; Distributed computing; Network servers; Open systems; Operating systems; Prototypes; Thermal management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Reliability Engineering, 1996. Proceedings., Seventh International Symposium on
  • Conference_Location
    White Plains, NY
  • Print_ISBN
    0-8186-7707-4
  • Type

    conf

  • DOI
    10.1109/ISSRE.1996.558833
  • Filename
    558833