DocumentCode :
1168624
Title :
Adaptive diagnosis in distributed systems
Author :
Rish, Irina ; Brodie, Mark ; Ma, Sheng ; Odintsova, Natalia ; Beygelzimer, Alina ; Grabarnik, Genady ; Hernandez, Karina
Author_Institution :
IBM T.J. Watson Res. Center, Hawthorne, NY, USA
Volume :
16
Issue :
5
fYear :
2005
Firstpage :
1088
Lastpage :
1109
Abstract :
Real-time problem diagnosis in large distributed computer systems and networks is a challenging task that requires fast and accurate inferences from potentially huge data volumes. In this paper, we propose a cost-efficient, adaptive diagnostic technique called active probing . Probes are end-to-end test transactions that collect information about the performance of a distributed system. Active probing uses probabilistic reasoning techniques combined with information-theoretic approach, and allows a fast online inference about the current system state via active selection of only a small number of most-informative tests. We demonstrate empirically that the active probing scheme greatly reduces both the number of probes (from 60% to 75% in most of our real-life applications), and the time needed for localizing the problem when compared with nonadaptive (preplanned) probing schemes. We also provide some theoretical results on the complexity of probe selection, and the effect of "noisy" probes on the accuracy of diagnosis. Finally, we discuss how to model the system\´s dynamics using dynamic Bayesian networks (DBNs), and an efficient approximate approach called sequential multifault; empirical results demonstrate clear advantage of such approaches over "static" techniques that do not handle system\´s changes.
Keywords :
belief networks; distributed processing; inference mechanisms; information theory; probability; real-time systems; transaction processing; active probing; adaptive diagnosis; computer networks; distributed computer systems; distributed systems; dynamic Bayesian networks; end-to-end transaction; information gain; information-theoretic approach; probabilistic inference; probabilistic reasoning; probe selection complexity; real-time problem diagnosis; sequential multifault approach; Airplanes; Bayesian methods; Computer networks; Current measurement; Distributed computing; Instruments; Intelligent networks; Medical diagnosis; Probes; System testing; Bayesian networks (BNs); computer networks; diagnosis; distributed systems; end-to-end transactions; information gain; probabilistic inference; Algorithms; Artifacts; Artificial Intelligence; Computer Communication Networks; Computer Simulation; Information Storage and Retrieval; Models, Statistical; Pattern Recognition, Automated; Signal Processing, Computer-Assisted; Telecommunications;
fLanguage :
English
Journal_Title :
Neural Networks, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9227
Type :
jour
DOI :
10.1109/TNN.2005.853423
Filename :
1510712
Link To Document :
بازگشت