• DocumentCode
    2050612
  • Title

    Filtering System Metrics for Minimal Correlation-Based Self-Monitoring

  • Author

    Munawar, Mohammad A. ; Jiang, Miao ; Reidemeister, Thomas ; Ward, Paul A S

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2009
  • fDate
    14-18 Sept. 2009
  • Firstpage
    233
  • Lastpage
    242
  • Abstract
    Self-adaptive and self-organizing systems must be self-monitoring. Recent research has shown that self-monitoring can be enabled by using correlations between monitoring variables (metrics). However, computer systems often make a very large number of metrics available for collection. Collecting them all not only reduces system performance, but also creates other overheads related to communication, storage, and processing. In order to control the overhead, it is necessary to limit collection to a subset of the available metrics. Manual selection of metrics requires a good understanding of system internals, which can be difficult given the size and complexity of modern computer systems. In this paper, assuming no knowledge of metric semantics or importance and no advance availability of fault data, we investigate automated methods for selecting a subset of available metrics in the context of correlation-based monitoring. Our goal is to collect fewer metrics while maintaining the ability to detect errors. We propose several metric selection methods that require no information beside correlations. We compare these methods on the basis of fault coverage. We show that our minimum spanning tree-based selection performs best, detecting on average 66% of faults detectable by full monitoring (i.e., using all considered metrics) with only 30% of the metrics.
  • Keywords
    filtering theory; monitoring; self-adjusting systems; software metrics; trees (mathematics); computer systems; filtering system metrics; metric semantics; minimal correlation-based self-monitoring; minimum spanning tree; self-adaptive systems; self-organizing systems; Communication system control; Computer errors; Computerized monitoring; Fault detection; Filtering; Humans; Predictive models; Pressing; Software systems; System performance; adaptive monitoring; error detection; metric correlations; self-monitoring; subset selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Self-Adaptive and Self-Organizing Systems, 2009. SASO '09. Third IEEE International Conference on
  • Conference_Location
    San Francisco, CA
  • Print_ISBN
    978-1-4244-4890-6
  • Electronic_ISBN
    978-0-7695-3794-8
  • Type

    conf

  • DOI
    10.1109/SASO.2009.36
  • Filename
    5298441