Title :
A methodology for root-cause analysis in component based systems
Author :
Kui Wang;Carol Fung;Chao Ding;Polo Pei;Shaohan Huang;Zhongzhi Luan;Depei Qian
Author_Institution :
Beijing municipal key laboratory of network technology, Beihang University, Beijing, China
fDate :
6/1/2015 12:00:00 AM
Abstract :
In component based enterprise systems, anomaly detectors are commonly deployed on application-level components, but not on lower-level functional components. When anomaly alarms are triggered, system managers are expected to handle them in a timely manner to avoid cascading failures. Excessive large volume of anomaly alarms makes them impractical to handle manually. Most existing root cause analysis methods are based on the assumption that all components are monitored and analysis are performed based on the time correlation of the generated alarms. However, full monitoring coverage may not be practical due to cost and complexity. In this paper, we present RCSF, a root cause analysis method that targets at systems where only application-level components are monitored by anomaly detectors. The method analyzes the components performance log on functional components and seek for most probable fault propagation sequences based on anomaly analysis. We evaluate the RCSF method based on real enterprise system data and compare it with some baseline methods. Experimental results show that our proposed method can effectively anchor the root causes of failures by providing a short list of most probable causes, and the performance is significantly improved compared to the baseline methods.
Keywords :
"Monitoring","Business","Correlation","Quality of service","Fault detection","Complexity theory","Biomedical monitoring"
Conference_Titel :
Quality of Service (IWQoS), 2015 IEEE 23rd International Symposium on
DOI :
10.1109/IWQoS.2015.7404741