• DocumentCode
    2714466
  • Title

    Towards Distributed and Adaptive Detection and Localisation of Network Faults

  • Author

    Steinert, Rebecca ; Gillblad, Daniel

  • Author_Institution
    Ind. Applic. & Methods Lab. (IAM), Swedish Inst. of Comput. Sci. (SICS), Kista, Sweden
  • fYear
    2010
  • fDate
    9-15 May 2010
  • Firstpage
    384
  • Lastpage
    389
  • Abstract
    We present a statistical probing-approach to distributed fault-detection in networked systems, based on autonomous configuration of algorithm parameters. Statistical modelling is used for detection and localisation of network faults. A detected fault is isolated to a node or link by collaborative fault-localisation. From local measurements obtained through probing between nodes, probe response delay and packet drop are modelled via parameter estimation for each link. Estimated model parameters are used for autonomous configuration of algorithm parameters, related to probe intervals and detection mechanisms. Expected fault-detection performance is formulated as a cost instead of specific parameter values, significantly reducing configuration efforts in a distributed system. The benefit offered by using our algorithm is fault-detection with increased certainty based on local measurements, compared to other methods not taking observed network conditions into account. We investigate the algorithm performance for varying user parameters and failure conditions. The simulation results indicate that more than 95% of the generated faults can be detected with few false alarms. At least 80% of the link faults and 65% of the node faults are correctly localised. The performance can be improved by parameter adjustments and by using alternative paths for communication of algorithm control messages.
  • Keywords
    fault diagnosis; parameter estimation; statistical analysis; telecommunication security; collaborative fault-localisation; distributed fault-detection; expected fault-detection performance; link faults; network fault localisation; packet drop; parameter estimation; probe response delay; statistical modelling; statistical probing-approach; Collaboration; Communication industry; Computer industry; Costs; Delay estimation; Fault detection; Parameter estimation; Performance evaluation; Probes; Testing; adaptive probing; distributed fault-detection; fault-localisation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications (AICT), 2010 Sixth Advanced International Conference on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-1-4244-6748-8
  • Type

    conf

  • DOI
    10.1109/AICT.2010.65
  • Filename
    5489793