• DocumentCode
    1897622
  • Title

    Distributed Diagnosis of Failures in a Three Tier E-Commerce System

  • Author

    Khanna, Gunjan ; Laguna, Ignacio ; Arshad, Fahad A. ; Bagchi, Saurabh

  • Author_Institution
    Purdue Univ., West Lafayette
  • fYear
    2007
  • fDate
    10-12 Oct. 2007
  • Firstpage
    185
  • Lastpage
    198
  • Abstract
    For dependability outages in distributed Internet infrastructures, it is often not enough to detect a failure, but it is also required to diagnose it, i.e., to identify its source. Complex applications deployed in multi-tier environments make diagnosis challenging because of fast error propagation, black-box applications, high diagnosis delay, the amount of states that can be maintained, and imperfect diagnostic tests. Here, we propose a probabilistic diagnosis model for arbitrary failures in components of a distributed application. The monitoring system (the Monitor) passively observes the message exchanges between the components and, at runtime, performs a probabilistic diagnosis of the component that was the root cause of a failure. We demonstrate the approach by applying it to the Pet Store J2EE application, and we compare it with Pinpoint by quantifying latency and accuracy in both systems. The Monitor outperforms Pinpoint by achieving comparably accurate diagnosis with higher precision in shorter time.
  • Keywords
    Internet; electronic commerce; object-oriented programming; program diagnostics; Monitor; Pet Store J2EE application; Pinpoint; arbitrary failures; black-box applications; complex applications; distributed Internet infrastructures; distributed application; e-commerce system; failure detection; failure distributed diagnosis; fast error propagation; high diagnosis delay; imperfect diagnostic tests; message exchanges; monitoring system; multitier environments; probabilistic diagnosis model; Application software; Condition monitoring; Distributed computing; Fault detection; Fault diagnosis; Positron emission tomography; Protocols; Reliability engineering; Runtime; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 2007. SRDS 2007. 26th IEEE International Symposium on
  • Conference_Location
    Beijing
  • ISSN
    1060-9857
  • Print_ISBN
    0-7695-2995-X
  • Type

    conf

  • DOI
    10.1109/SRDS.2007.16
  • Filename
    4365695