DocumentCode :
244316
Title :
POD-Diagnosis: Error Diagnosis of Sporadic Operations on Cloud Applications
Author :
Xiwei Xu ; Liming Zhu ; Weber, Ingo ; Bass, Len ; Sun, D.
Author_Institution :
SSRG, NICTA, Sydney, NSW, Australia
fYear :
2014
fDate :
23-26 June 2014
Firstpage :
252
Lastpage :
263
Abstract :
Applications in the cloud are subject to sporadic changes due to operational activities such as upgrade, redeployment, and on-demand scaling. These operations are also subject to interferences from other simultaneous operations. Increasing the dependability of these sporadic operations is non-trivial, particularly since traditional anomaly-detection-based diagnosis techniques are less effective during sporadic operation periods. A wide range of legitimate changes confound anomaly diagnosis and make baseline establishment for "normal" operation difficult. The increasing frequency of these sporadic operations (e.g. due to continuous deployment) is exacerbating the problem. Diagnosing failures during sporadic operations relies heavily on logs, while log analysis challenges stemming from noisy, inconsistent and voluminous logs from multiple sources remain largely unsolved. In this paper, we propose Process Oriented Dependability (POD)-Diagnosis, an approach that explicitly models these sporadic operations as processes. These models allow us to (i) determine orderly execution of the process, and (ii) use the process context to filter logs, trigger assertion evaluations, visit fault trees and perform on-demand assertion evaluation for online error diagnosis and root cause analysis. We evaluated the approach on rolling upgrade operations in Amazon Web Services (A WS) while performing other simultaneous operations. During our evaluation, we correctly detected all of the 160 injected faults, as well as 46 interferences caused by concurrent operations. We did this with 91.95% precision. Of the correctly detected faults, the accuracy rate of error diagnosis is 96.55%.
Keywords :
Web services; cloud computing; fault trees; Amazon Web services; POD-diagnosis; anomaly detection; cloud applications; error diagnosis; fault trees; log analysis challenges; process oriented dependability; sporadic operations; Analytical models; Context; Context modeling; Fault trees; Information filters; Monitoring; DevOps; cloud; deployment; error detection; error diagnosis; process mining; system administration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location :
Atlanta, GA
Type :
conf
DOI :
10.1109/DSN.2014.94
Filename :
6903584
Link To Document :
بازگشت