Title :
Introspective failure analysis: avoiding correlated failures in peer-to-peer systems
Author :
Weatherspoon, Hakim ; Moscovitz, Tal ; Kubiatowicz, John
Author_Institution :
Div. of Comput. Sci., California Univ., Berkeley, CA, USA
Abstract :
Failure independence is an important assumption for many fault tolerance techniques. Unfortunately, real systems exhibit correlated failures. In this paper, we present a framework for online discovery of groups of server nodes that are maximally independent in their failure characteristics. We discuss the framework in detail and provide a preliminary evaluation.
Keywords :
Internet; computer network reliability; fault tolerant computing; file servers; correlated failures; fault tolerance techniques; introspective failure analysis; online server node group discovery; peer-to-peer systems; Algorithm design and analysis; Availability; Computer science; Failure analysis; Fault tolerance; Fault tolerant systems; Peer to peer computing; Protocols; Redundancy; Web server;
Conference_Titel :
Reliable Distributed Systems, 2002. Proceedings. 21st IEEE Symposium on
Print_ISBN :
0-7695-1659-9
DOI :
10.1109/RELDIS.2002.1180211