• DocumentCode
    583018
  • Title

    Improving the Quality of Service of Fault Detection in Distributed Platforms under Adverse Network Conditions

  • Author

    Lemos, Fernando Tarlá Cardoso ; Sato, Liria Matsumoto

  • fYear
    2012
  • fDate
    17-19 Oct. 2012
  • Firstpage
    171
  • Lastpage
    178
  • Abstract
    Fault detection is core functionality required by most fault tolerance strategies, but it often depends on reliable communication between computing nodes exchanging monitoring information. We present techniques to improve the robustness of fault detectors for distributed platforms in situations where network connectivity is affected by packet loss and delays. Similar network conditions can be found in computing grids connecting geographically distant resources. We present results from experimental tests conducted in a simulated environment. The results show significant improvement over traditional approaches.
  • Keywords
    digital simulation; grid computing; quality of service; software fault tolerance; software reliability; adverse network conditions; computing grids; computing nodes; delays; distributed platforms; fault detection; fault tolerance strategies; geographically distant resources; monitoring information; network connectivity; packet loss; quality of service; reliable communication; simulated environment; Biomedical monitoring; Computational modeling; Detectors; Heart beat; Monitoring; Payloads; Software; Distributed Failure Detectors; Failure Detection; Fault Tolerance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems (WSCAD-SSC), 2012 13th Symposium on
  • Conference_Location
    Petropolis
  • Print_ISBN
    978-1-4673-4468-5
  • Type

    conf

  • DOI
    10.1109/WSCAD-SSC.2012.25
  • Filename
    6391779