• DocumentCode
    1435416
  • Title

    Improving the Robustness of Distributed Failure Detectors in Adverse Conditions

  • Author

    Lemos, F.T.C. ; Sato, L.M.

  • Author_Institution
    Univ. de Sao Paulo (USP), Sao Paulo, Brazil
  • Volume
    10
  • Issue
    1
  • fYear
    2012
  • Firstpage
    1364
  • Lastpage
    1369
  • Abstract
    Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.
  • Keywords
    protocols; telecommunication network reliability; adverse network conditions; distributed failure detectors; fault tolerance strategies; heartbeat messages; protocols; reliable communication; Biomedical monitoring; Detectors; Fault tolerance; Heart beat; Monitoring; Payloads; Robustness; Distributed Failure Detectors; Failure Detection; Fault Tolerance;
  • fLanguage
    English
  • Journal_Title
    Latin America Transactions, IEEE (Revista IEEE America Latina)
  • Publisher
    ieee
  • ISSN
    1548-0992
  • Type

    jour

  • DOI
    10.1109/TLA.2012.6142485
  • Filename
    6142485