Title :
Invariants Based Failure Diagnosis in Distributed Computing Systems
Author :
Chen, Haifeng ; Jiang, Guofei ; Yoshihira, Kenji ; Saxena, Akhilesh
Author_Institution :
NEC Labs. America, Inc., Princeton, NJ, USA
fDate :
Oct. 31 2010-Nov. 3 2010
Abstract :
This paper presents an instance based approach to diagnosing failures in computing systems. Owing to the fact that a large portion of occurred failures are repeated ones, our method takes advantage of past experiences by storing historical failures in a database and retrieving similar instances in the occurrence of failure. We extract the system `invariants´ by modeling consistent dependencies between system attributes during the operation, and construct a network graph based on the learned invariants. When a failure happens, the status of invariants network, i.e., whether each invariant link is broken or not, provides a view of failure characteristics. We use a high dimensional binary vector to store those failure evidences, and develop a novel algorithm to efficiently retrieve failure signatures from the database. Experimental results in a web based system have demonstrated the effectiveness of our method in diagnosing the injected failures.
Keywords :
distributed processing; graph theory; software fault tolerance; vectors; Web based system; binary vector; distributed computing systems; invariant based failure diagnosis; network graph; Computational modeling; Correlation; Data models; Databases; Measurement; Web server; Distributed Systems; Failure Diagnosis; Invariants;
Conference_Titel :
Reliable Distributed Systems, 2010 29th IEEE Symposium on
Conference_Location :
New Delhi
Print_ISBN :
978-0-7695-4250-8
DOI :
10.1109/SRDS.2010.26