Title :
System Monitoring with Metric-Correlation Models
Author :
Jiang, Miao ; Munawar, Mohammad A. ; Reidemeister, Thomas ; Ward, Paul A S
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fDate :
12/1/2011 12:00:00 AM
Abstract :
Modern software systems expose management metrics to help track their health. Recently, it was demonstrated that correlations among these metrics allow errors to be detected and their causes localized. Prior research shows that linear models can capture many of these correlations. However, our research shows that several factors may prevent linear models from accurately describing correlations, even if the underlying relationship is linear. Common phenomena we have observed include relationships that evolve, relationships with missing variables, and heterogeneous residual variance of the correlated metrics. Usually these phenomena can be discovered by testing for heteroscedasticity of the underlying linear models. Such behaviour violates the assumptions of simple linear regression, which thus fail to describe system dynamics correctly. In this paper we address the above challenges by employing efficient variants of Ordinary Least Squares regression models. In addition, we automate the process of error detection by introducing the Wilcoxon Rank-Sum test after proper correlations modeling. We validate our models using a realistic Java-Enterprise-Edition application. Using fault-injection experiments we show that our improved models capture system behavior accurately.
Keywords :
error detection; least squares approximations; program testing; regression analysis; software fault tolerance; software metrics; system monitoring; Java-Enterprise-Edition application; Wilcoxon rank-sum test; correlations modeling; error detection; fault injection experiment; heterogeneous residual variance; heteroscedasticity testing; linear model; linear regression; metric-correlation model; ordinary least squares regression model; software system management metrics; system dynamics; system monitoring; Adaptation models; Computational modeling; Correlation; Measurement; Monitoring; Predictive models; Software systems; System monitoring; fault detection; heteroscedasticity; metric-correlation models; multi-variable correlations; recursive least squares;
Journal_Title :
Network and Service Management, IEEE Transactions on
DOI :
10.1109/TNSM.2011.120811.100033