• DocumentCode
    1389755
  • Title

    Distributed Diagnosis of Dynamic Events in Partitionable Arbitrary Topology Networks

  • Author

    Duarte, Elias P., Jr. ; Weber, Andréa ; Fonseca, Keiko V Ono

  • Author_Institution
    Dept. of Inf., Fed. Univ. of Parana, Curitiba, Brazil
  • Volume
    23
  • Issue
    8
  • fYear
    2012
  • Firstpage
    1415
  • Lastpage
    1426
  • Abstract
    This work introduces the Distributed Network Reachability (DNR) algorithm, a distributed system-level diagnosis algorithm that allows every node of a partitionable arbitrary topology network to determine which portions of the network are reachable and unreachable. DNR is the first distributed diagnosis algorithm that works in the presence of network partitions and healings caused by dynamic fault and repair events. Both crash and timing faults are assumed, and a faulty node is indistinguishable of a network partition. Every link is alternately tested by one of its adjacent nodes at subsequent testing intervals. Upon the detection of a new event, the new diagnostic information is disseminated to reachable nodes. New events can occur before the dissemination completes. Any time a new event is detected or informed, a working node may compute the network reachability using local diagnostic information. The bounded correctness of DNR is proved, including the bounded diagnostic latency, bounded startup and accuracy. Simulation results are presented for several random and regular topologies, showing the performance of the algorithm under highly dynamic fault situations.
  • Keywords
    fault diagnosis; multiprocessing systems; topology; distributed dynamic event diagnosis; distributed network reachability algorithm; distributed system-level diagnosis algorithm; network partition; partitionable arbitrary topology networks; Clocks; Heuristic algorithms; Network topology; Partitioning algorithms; Testing; Timing; Topology; Network reachability; bounded correctness.; distributed diagnosis; dynamic fault diagnosis; multiprocessor systems;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2011.284
  • Filename
    6095526