Title :
Dependency-aware fault diagnosis with metric-correlation models in enterprise software systems
Author :
Jiang, Miao ; Munawar, Mohammad A. ; Reidemeister, Thomas ; Ward, Paul A S
Author_Institution :
E&CE Dept., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
The normal operation of enterprise software systems can be modeled by stable correlations between various system metrics; errors are detected when some of these correlations fail to hold. The typical approach to diagnosis (i.e., pinpoint the faulty component) based on the correlation models is to use the Jaccard coefficient or some variant thereof, without reference to system structure, dependency data, or prior fault data. In this paper we demonstrate the intrinsic limitations of this approach, and propose a solution that mitigates these limitations. We assume knowledge of dependencies between components in the system, and take this information into account when analyzing the correlation models. We also propose the use of the Tanimoto coefficient instead of the Jaccard coefficient to assign anomaly scores to components. We evaluate our new algorithm with a Trade6-based test-bed. We show that we can find the faulty components within top-3 components with the highest anomaly score in four out of nine cases, while the prior method can only find one.
Keywords :
business data processing; fault diagnosis; Jaccard coefficient; Tanimoto coefficient; Trade6-based test-bed; dependency-aware fault diagnosis; enterprise software systems; metric-correlation models; Availability; Correlation; Fault diagnosis; Measurement; Monitoring; Software systems; Time factors;
Conference_Titel :
Network and Service Management (CNSM), 2010 International Conference on
Conference_Location :
Niagara Falls, ON
Print_ISBN :
978-1-4244-8910-7
Electronic_ISBN :
978-1-4244-8908-4
DOI :
10.1109/CNSM.2010.5691319