DocumentCode
244316
Title
POD-Diagnosis: Error Diagnosis of Sporadic Operations on Cloud Applications
Author
Xiwei Xu ; Liming Zhu ; Weber, Ingo ; Bass, Len ; Sun, D.
Author_Institution
SSRG, NICTA, Sydney, NSW, Australia
fYear
2014
fDate
23-26 June 2014
Firstpage
252
Lastpage
263
Abstract
Applications in the cloud are subject to sporadic changes due to operational activities such as upgrade, redeployment, and on-demand scaling. These operations are also subject to interferences from other simultaneous operations. Increasing the dependability of these sporadic operations is non-trivial, particularly since traditional anomaly-detection-based diagnosis techniques are less effective during sporadic operation periods. A wide range of legitimate changes confound anomaly diagnosis and make baseline establishment for "normal" operation difficult. The increasing frequency of these sporadic operations (e.g. due to continuous deployment) is exacerbating the problem. Diagnosing failures during sporadic operations relies heavily on logs, while log analysis challenges stemming from noisy, inconsistent and voluminous logs from multiple sources remain largely unsolved. In this paper, we propose Process Oriented Dependability (POD)-Diagnosis, an approach that explicitly models these sporadic operations as processes. These models allow us to (i) determine orderly execution of the process, and (ii) use the process context to filter logs, trigger assertion evaluations, visit fault trees and perform on-demand assertion evaluation for online error diagnosis and root cause analysis. We evaluated the approach on rolling upgrade operations in Amazon Web Services (A WS) while performing other simultaneous operations. During our evaluation, we correctly detected all of the 160 injected faults, as well as 46 interferences caused by concurrent operations. We did this with 91.95% precision. Of the correctly detected faults, the accuracy rate of error diagnosis is 96.55%.
Keywords
Web services; cloud computing; fault trees; Amazon Web services; POD-diagnosis; anomaly detection; cloud applications; error diagnosis; fault trees; log analysis challenges; process oriented dependability; sporadic operations; Analytical models; Context; Context modeling; Fault trees; Information filters; Monitoring; DevOps; cloud; deployment; error detection; error diagnosis; process mining; system administration;
fLanguage
English
Publisher
ieee
Conference_Titel
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location
Atlanta, GA
Type
conf
DOI
10.1109/DSN.2014.94
Filename
6903584
Link To Document