• DocumentCode
    2299684
  • Title

    Research on the health diagnosis module of large-scale clusters

  • Author

    Cong Yang ; Wen-long Du

  • Author_Institution
    Cloud Comput. Res. Center, Shenzhen Inst. of Adv. Technol., Shenzhen, China
  • fYear
    2012
  • fDate
    29-31 Dec. 2012
  • Firstpage
    589
  • Lastpage
    593
  • Abstract
    A large number of low-level performance metrics include process, virtual and physical machine metrics that can be measured to identify a node or even a cluster health status. Traditionally, nodes in the cluster are monitored and managers need to analyze each metrics and alarming messages from monitoring tools to identify the health status of clusters. However, this process would cost too much time on some insignificant metrics and with less efficient because most clusters have more than hundreds nodes and it´s impossible for one manager to check too much metrics in each nodes. In this work, we demonstrate that more time can be saved by simplify metrics set, scoring each nodes and diagnosis nodes health status by decision tree. Specially, this work first experimentally verifies and sorts the degree of relation between node health and different metrics. After that, we collect and score the training set by load increase testing. Thirdly, we construct a decision tree by training set. Finally, a health diagnosis module is composed by previous process, algorithm and decision tree. We evaluate the Health Diagnosis Module (HDM) on the Normal PC cluster. Experiments show that HDM can precise diagnose nodes and clusters´ health status with more than 89% accuracy rate.
  • Keywords
    computer network performance evaluation; decision trees; virtual machines; HDM; PC cluster health status identification; decision tree; large-scale cluster health diagnosis module; load increase testing method; low-level performance metrics; node health status diagnosis; physical machine metrics; training set collection; training set scoring algorithm; virtual machine metrics; cloud computing; decision tree; health diagnosis module;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
  • Conference_Location
    Changchun
  • Print_ISBN
    978-1-4673-2963-7
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2012.6526006
  • Filename
    6526006