DocumentCode
3348564
Title
Failure Detection in Large Scale Systems: a Survey
Author
Pasin, Marcia ; Fontaine, Stéphane ; Bouchenak, Sara
Author_Institution
Lab. de Sist. de Comput., Fed. Univ. of Santa Maria, Santa Maria
fYear
2008
fDate
7-11 April 2008
Firstpage
165
Lastpage
168
Abstract
Failure detection is a basic service for building dependable systems. The large scale distribution of computing systems naturally makes failure detectors much harder to build. Moreover, providing QoS (quality of service) guarantees in this context is a challenging task. The objective of this paper is twofold: (1) proposing a complete set of classification criteria to compare different failure detection mechanisms, and based on these criteria (2) surveying the main failure detection solutions for large scale distributed systems.
Keywords
distributed processing; fault tolerant computing; large-scale systems; quality of service; QoS; failure detection; large scale distributed systems; large scale systems; quality of service; Best practices; Condition monitoring; Context-aware services; Detectors; Distributed computing; Fault detection; Heart beat; Large-scale systems; Quality of service; Scalability;
fLanguage
English
Publisher
ieee
Conference_Titel
Network Operations and Management Symposium Workshops, 2008. NOMS Workshops 2008. IEEE
Conference_Location
Salvador da Bahia
Print_ISBN
978-1-4244-2067-4
Type
conf
DOI
10.1109/NOMSW.2007.28
Filename
4509944
Link To Document