Title :
Shedding Light on Enterprise Network Failures Using Spotlight
Author :
John, Dipu ; Prakash, Pawan ; Kompella, Ramana Rao ; Chandra, Ranveer
Author_Institution :
Purdue Univ., West Lafayette, IN, USA
fDate :
Oct. 31 2010-Nov. 3 2010
Abstract :
Fault localization in enterprise networks is extremely challenging. A recent approach called Sherlock makes some headway into this problem by using an inference algorithm over a multi-tier probabilistic dependency graph that relates fault symptoms with possible root causes (e.g., routers, servers). A key limitation of Sherlock is its scalability because of the use of complicated inference algorithms based on Bayesian networks. We present a fault localization system called Spotlight that essentially uses two basic ideas. First, it compresses a multi-tier dependency graph into a bipartite graph with direct probabilistic edges between root causes and symptoms. Second, it runs a novel weighted greedy minimum set cover algorithm to provide fast inference. Through extensive simulations with real service dependency graphs and enterprise network topologies reported previously in literature, we show that Spotlight is about 100× faster than Sherlock in typical settings, with comparable accuracy in diagnosis.
Keywords :
belief networks; business data processing; fault tolerant computing; graph theory; inference mechanisms; probability; Bayesian networks; Sherlock approach; Spotlight system; bipartite graph; enterprise network failure; enterprise network topology; fault localization system; greedy minimum set cover algorithm; inference algorithm; multitier probabilistic dependency graph; service dependency graphs; Accuracy; Bayesian methods; Inference algorithms; Instruments; Network topology; Probabilistic logic; Servers; dependency graphs; enterprise networks; fault localization;
Conference_Titel :
Reliable Distributed Systems, 2010 29th IEEE Symposium on
Conference_Location :
New Delhi
Print_ISBN :
978-0-7695-4250-8
DOI :
10.1109/SRDS.2010.27