Title :
Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems
Author :
Jiang, Guofei ; Chen, Haifeng ; Yoshihira, Kenji
Author_Institution :
NEC Lab. America Inc., Princeton
Abstract :
Distributed systems generate a large amount of monitoring data such as log files to track their operational status. However, it is hard to correlate such monitoring data effectively across distributed systems and along observation time for system management. In previous work, we proposed a concept named flow intensity to measure the intensity with which internal monitoring data reacts to the volume of user requests. We calculated flow intensity measurements from monitoring data and proposed an algorithm to automatically search constant relationships between flow intensities measured at various points across distributed systems. If such relationships hold all the time, we regard them as invariants of the underlying systems. Invariants can be used to characterize complex systems and support various system management tasks. However, the computational complexity of the previous invariant search algorithm is high so that it may not scale well in large systems with thousands of measurements. In this paper, we propose two efficient but approximate algorithms for inferring invariants in large-scale systems. The computational complexity of new randomized algorithms is significantly reduced, and experimental results from a real system are also included to demonstrate the accuracy and efficiency of our new algorithms.
Keywords :
computational complexity; distributed processing; randomised algorithms; approximate algorithms; computational complexity; data monitoring; distributed systems; invariant inferring; large-scale systems; log files; operational status tracking; randomized algorithms; system management; Computational complexity; Computerized monitoring; Costs; Fault detection; Fluid flow measurement; Hardware; Helium; Large-scale systems; Management information systems; Volume measurement; Algorithms for data and knowledge management; Analysis of Algorithms and Problem Complexity; Data mining; Distributed Systems; System Management; Time series analysis;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2007.190648