Title :
Hard Drive Failure Prediction Using Classification and Regression Trees
Author :
Jing Li ; Xinpu Ji ; Yuhan Jia ; Bingpeng Zhu ; Gang Wang ; Zhongwei Li ; Xiaoguang Liu
Author_Institution :
Nankai-Baidu Joint Lab., Nankai Univ., Tianjin, China
Abstract :
Some statistical and machine learning methods have been proposed to build hard drive prediction models based on the SMART attributes, and have achieved good prediction performance. However, these models were not evaluated in the way as they are used in real-world data centers. Moreover, the hard drives deteriorate gradually, but these models can not describe this gradual change precisely. This paper proposes new hard drive failure prediction models based on Classification and Regression Trees, which perform better in prediction performance as well as stability and interpretability compared with the state-of the-art model, the Back propagation artificial neural network model. Experiments demonstrate that the Classification Tree (CT) model predicts over 95% of failures at a false alarm rate (FAR) under 0.1% on a real-world dataset containing 25,792 drives. Aiming at the practical application of prediction models, we test them with different drive families, with fewer number of drives, and with different model updating strategies. The CT model still shows steady and good performance. We propose a health degree model based on Regression Tree (RT) as well, which can give the drive a health assessment rather than a simple classification result. Therefore, the approach can deal with warnings raised by the prediction model in order of their health degrees. We implement a reliability model for RAID-6 systems with proactive fault tolerance and show that our CT model can significantly improve the reliability and/or reduce construction and maintenance cost of large-scale storage systems.
Keywords :
RAID; disc drives; fault tolerant computing; hard discs; pattern classification; regression analysis; FAR; RAID-6 systems; SMART attributes; backpropagation artificial neural network; classification trees; data centers; false alarm rate; hard drive failure prediction; health degree model; health degrees; large-scale storage systems; machine learning methods; prediction interpretability; prediction performance; prediction stability; redundant array of independent disks; regression trees; statistical methods; Data models; Hidden Markov models; Prediction algorithms; Predictive models; Regression tree analysis; Reliability; Training; CART; Hard drive failure prediction; Health degree; SMART;
Conference_Titel :
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location :
Atlanta, GA
DOI :
10.1109/DSN.2014.44