Title :
Real-time failure prediction in online services
Author :
Shatnawi, Mohammed ; Hefeeda, Mohamed
Author_Institution :
Microsoft, Redmond, WA, USA
fDate :
April 26 2015-May 1 2015
Abstract :
Current data mining techniques used to create failure predictors for online services require massive amounts of data to build, train, and test the predictors. These operations are tedious, time consuming, and are not done in real-time. Also, the accuracy of the resulting predictor is highly compromised by changes that affect the environment and working conditions of the predictor. We propose a new approach to creating a dynamic failure predictor for online services in real-time and keeping its accuracy high during the services run-time changes. We use synthetic transactions during the run-time lifecycle to generate current data about the service. This data is used in its ephemeral state to build, train, test, and maintain an up-to-date failure predictor. We implemented the proposed approach in a large-scale online ad service that processes billions of requests each month in six data centers distributed in three continents. We show that the proposed predictor is able to maintain failure prediction accuracy as high as 86% during online service changes, whereas the accuracy of the state-of-the-art predictors may drop to less than 10%.
Keywords :
Web services; computer centres; contracts; data mining; failure analysis; real-time systems; system recovery; data mining technique; distributed data centers; dynamic failure predictor; large-scale online ad service; online service changes; real-time failure prediction; synthetic transactions; up-to-date failure predictor; working conditions; Accuracy; Data mining; Monitoring; Production; Real-time systems; Testing; Time factors;
Conference_Titel :
Computer Communications (INFOCOM), 2015 IEEE Conference on
Conference_Location :
Kowloon
DOI :
10.1109/INFOCOM.2015.7218516