• DocumentCode
    1148823
  • Title

    Abstractions for Node Level Passive Fault Detection in Distributed Systems

  • Author

    Oikonomou, Kostas N. ; Kain, Richard Y.

  • Author_Institution
    Bell Laboratories
  • Issue
    6
  • fYear
    1983
  • fDate
    6/1/1983 12:00:00 AM
  • Firstpage
    543
  • Lastpage
    550
  • Abstract
    We introduce a scheme for passive node-level fault detection in a distributed system. With each system node associate a low-cost, low-complexity observer which monitors the pattern of incoming and outgoing messages and compares it against an abstracted model of the node´s behavior. We develop a fault detection procedure, which is probabilistic because of nondeterminism in the simplified node model. Abstraction reduces model complexity, but renders some errors undetectable by the observer. In the paper we characterize these undetectable errors. Succeeding studies show how to select model abstractions to lower the number of undetectable errors.
  • Keywords
    Concurrent fault detection; distributed systems; fault detection; Computer errors; Condition monitoring; Costs; Fault detection; Frequency estimation; Missiles; Observers; Probability; Signal design; System testing; Concurrent fault detection; distributed systems; fault detection;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.1983.1676276
  • Filename
    1676276