• DocumentCode
    390040
  • Title

    Failure detectors for large-scale distributed systems

  • Author

    Hayashibara, Naohiro ; Cherif, Adel ; Katayama, Takuya

  • Author_Institution
    Graduate Sch. of Inf. Sci., JAIST, Ishikawa, Japan
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    404
  • Lastpage
    409
  • Abstract
    This paper discusses the problem of implementing a scalable failure detection service for grid systems. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address important problems found in wide-area distributed systems, such as grid systems. We identify some of the most important problems raised in the context of grids. We then survey recent propositions that can help in solving some of these problems.
  • Keywords
    computer network reliability; fault tolerant computing; wide area networks; failure detectors; grid systems; large-scale distributed systems; scalable failure detection service; wide-area distributed systems; Computer crashes; Computer networks; Computerized monitoring; Condition monitoring; Detectors; Distributed computing; Grid computing; Information science; Large-scale systems; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 2002. Proceedings. 21st IEEE Symposium on
  • ISSN
    1060-9857
  • Print_ISBN
    0-7695-1659-9
  • Type

    conf

  • DOI
    10.1109/RELDIS.2002.1180218
  • Filename
    1180218