DocumentCode
2875592
Title
Invariants Based Failure Diagnosis in Distributed Computing Systems
Author
Chen, Haifeng ; Jiang, Guofei ; Yoshihira, Kenji ; Saxena, Akhilesh
Author_Institution
NEC Labs. America, Inc., Princeton, NJ, USA
fYear
2010
fDate
Oct. 31 2010-Nov. 3 2010
Firstpage
160
Lastpage
166
Abstract
This paper presents an instance based approach to diagnosing failures in computing systems. Owing to the fact that a large portion of occurred failures are repeated ones, our method takes advantage of past experiences by storing historical failures in a database and retrieving similar instances in the occurrence of failure. We extract the system `invariants´ by modeling consistent dependencies between system attributes during the operation, and construct a network graph based on the learned invariants. When a failure happens, the status of invariants network, i.e., whether each invariant link is broken or not, provides a view of failure characteristics. We use a high dimensional binary vector to store those failure evidences, and develop a novel algorithm to efficiently retrieve failure signatures from the database. Experimental results in a web based system have demonstrated the effectiveness of our method in diagnosing the injected failures.
Keywords
distributed processing; graph theory; software fault tolerance; vectors; Web based system; binary vector; distributed computing systems; invariant based failure diagnosis; network graph; Computational modeling; Correlation; Data models; Databases; Measurement; Web server; Distributed Systems; Failure Diagnosis; Invariants;
fLanguage
English
Publisher
ieee
Conference_Titel
Reliable Distributed Systems, 2010 29th IEEE Symposium on
Conference_Location
New Delhi
ISSN
1060-9857
Print_ISBN
978-0-7695-4250-8
Type
conf
DOI
10.1109/SRDS.2010.26
Filename
5623388
Link To Document