DocumentCode
945924
Title
Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems
Author
Chen, Haifeng ; Jiang, Guofei ; Yoshihira, Kenji
Author_Institution
NEC Lab. America, Princeton
Volume
20
Issue
1
fYear
2008
Firstpage
13
Lastpage
25
Abstract
It is a major challenge to process high-dimensional measurements for failure detection and localization in large-scale computing systems. However, it is observed that in information systems, those measurements are usually located in a low-dimensional structure that is embedded in the high-dimensional space. From this perspective, a novel approach is proposed to model the geometry of underlying data generation and detect anomalies based on that model. We consider both linear and nonlinear data generation models. Two statistics, that is, the Hotelling T2 and the squared prediction error (SPE), are used to reflect data variations within and outside the model. We track the probabilistic density of extracted statistics to monitor the system´s health. After a failure has been detected, a localization process is also proposed to find the most suspicious attributes related to the failure. Experimental results on both synthetic data and a real e-commerce application demonstrate the effectiveness of our approach in detecting and localizing failures in computing systems.
Keywords
Internet; data mining; information systems; computing systems failures; e-commerce; failure detection; failure localization; high-dimensional data monitoring; large-scale computing systems; nonlinear data generation; squared prediction error; Internet applications; data mining; failure detection; information system; manifold learning; statistics;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2007.190674
Filename
4358960
Link To Document