DocumentCode :
187428
Title :
Failure Prediction of Jobs in Compute Clouds: A Google Cluster Case Study
Author :
Xin Chen ; Charng-Da Lu ; Pattabiraman, Karthik
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
fYear :
2014
fDate :
3-6 Nov. 2014
Firstpage :
341
Lastpage :
346
Abstract :
Most cloud computing clusters are built from unreliable, commercial off-the-shelf components. The high failure rates in their hardware and software components result in frequent node and application failures. Therefore, it is important to predict application failures before they occur to avoid resource wastage. In this paper, we investigate how to identify application failures based on resource usage measurements from the Google cluster traces. We apply recurrent neural networks to the resource usage measures, and generate features to categorize the input resource usage time series into different classes. Our results show that the model is able to predict failures of batch applications, which are the dominant jobs in the Google cluster. Moreover, we explore early classification to identify failures, and find that the prediction algorithm provides the cloud system enough time to take proactive actions much earlier than the termination of applications, with an average 6% to 10% of resource savings.
Keywords :
cloud computing; fault tolerant computing; Google cluster case study; application failures; application termination; cloud computing clusters; cloud system; commercial off-the-shelf components; compute clouds; failure rates; hardware components; job failure prediction; prediction algorithm; resource savings; resource wastage; software components; Feature extraction; Google; Prediction algorithms; Recurrent neural networks; Reliability; Time measurement; Application failure; Cloud reliability; Failure prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on
Conference_Location :
Naples
Type :
conf
DOI :
10.1109/ISSREW.2014.105
Filename :
6983864
Link To Document :
بازگشت