• DocumentCode
    2875592
  • Title

    Invariants Based Failure Diagnosis in Distributed Computing Systems

  • Author

    Chen, Haifeng ; Jiang, Guofei ; Yoshihira, Kenji ; Saxena, Akhilesh

  • Author_Institution
    NEC Labs. America, Inc., Princeton, NJ, USA
  • fYear
    2010
  • fDate
    Oct. 31 2010-Nov. 3 2010
  • Firstpage
    160
  • Lastpage
    166
  • Abstract
    This paper presents an instance based approach to diagnosing failures in computing systems. Owing to the fact that a large portion of occurred failures are repeated ones, our method takes advantage of past experiences by storing historical failures in a database and retrieving similar instances in the occurrence of failure. We extract the system `invariants´ by modeling consistent dependencies between system attributes during the operation, and construct a network graph based on the learned invariants. When a failure happens, the status of invariants network, i.e., whether each invariant link is broken or not, provides a view of failure characteristics. We use a high dimensional binary vector to store those failure evidences, and develop a novel algorithm to efficiently retrieve failure signatures from the database. Experimental results in a web based system have demonstrated the effectiveness of our method in diagnosing the injected failures.
  • Keywords
    distributed processing; graph theory; software fault tolerance; vectors; Web based system; binary vector; distributed computing systems; invariant based failure diagnosis; network graph; Computational modeling; Correlation; Data models; Databases; Measurement; Web server; Distributed Systems; Failure Diagnosis; Invariants;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 2010 29th IEEE Symposium on
  • Conference_Location
    New Delhi
  • ISSN
    1060-9857
  • Print_ISBN
    978-0-7695-4250-8
  • Type

    conf

  • DOI
    10.1109/SRDS.2010.26
  • Filename
    5623388