• DocumentCode
    244929
  • Title

    Identifying Recurrent and Unknown Performance Issues

  • Author

    Meng-Hui Lim ; Jian-Guang Lou ; Hongyu Zhang ; Qiang Fu ; Teoh, Andrew Beng Jin ; Qingwei Lin ; Rui Ding ; Dongmei Zhang

  • Author_Institution
    Dept. of Comput. Sci., Hong Kong Baptist Univ., Hong Kong, China
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    320
  • Lastpage
    329
  • Abstract
    For a large-scale software system, especially an online service system, when a performance issue occurs, it is desirable to check whether this issue has occurred before. If there are past similar issues, a known remedy could be applied. Otherwise, a new troubleshooting process may have to be initiated. The symptom of a performance issue can be characterized by a set of metrics. Due to the sophisticated nature of software systems, manual diagnosis of performance issues based on metric data is typically expensive and laborious. In this paper, we propose a Hidden Markov Random Field (HMRF) based approach to automatic identification of recurrent and unknown performance issues. We formulate the problem of issue identification as a HMRF-based clustering problem. Our approach incorporates the learning of metric discretization thresholds and the optimization of issue clustering. Based on the learned thresholds and cluster centroids, we can achieve accurate identification of recurrent issues and unknown issues. Experimental evaluations on an open benchmark and a large-scale industrial production system show that our approach is effective and outperforms the related state-of-the-art approaches.
  • Keywords
    hidden Markov models; pattern clustering; HMRF-based clustering problem; automatic identification; cluster centroids; hidden Markov random field; issue clustering; issue identification; large-scale industrial production system; large-scale software system; metric data; metric discretization thresholds; online service system; troubleshooting process; unknown performance issues; Clustering algorithms; Fingerprint recognition; Hidden Markov models; Measurement; Monitoring; Production systems; Vectors; Issue identification; automated diagnosis; duplication detection; metrics; performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.96
  • Filename
    7023349