DocumentCode :
1379410
Title :
Robust monitoring of network-wide aggregates through gossiping
Author :
Wuhib, Fetahi ; Dam, Mads ; Stadler, Rolf ; Clem, Alexander
Author_Institution :
ACCESS Linnaeus Center, KTH R. Inst. of Technol., Stockholm, Sweden
Volume :
6
Issue :
2
fYear :
2009
fDate :
6/1/2009 12:00:00 AM
Firstpage :
95
Lastpage :
109
Abstract :
We investigate the use of gossip protocols for continuous monitoring of network-wide aggregates under crash failures. Aggregates are computed from local management variables using functions such as SUM, MAX, or AVERAGE. For this type of aggregation, crash failures offer a particular challenge due to the problem of mass loss, namely, how to correctly account for contributions from nodes that have failed. In this paper we give a partial solution. We present G-GAP, a gossip protocol for continuous monitoring of aggregates, which is robust against failures that are discontiguous in the sense that neighboring nodes do not fail within a short period of each other. We give formal proofs of correctness and convergence, and we evaluate the protocol through simulation using real traces. The simulation results suggest that the design goals for this protocol have been met. For instance, the tradeoff between estimation accuracy and protocol overhead can be controlled, and a high estimation accuracy (below some 5% error in our measurements) is achieved by the protocol, even for large networks and frequent node failures. Further, we perform a comparative assessment of GGAP against a tree-based aggregation protocol using simulation. Surprisingly, we find that the tree-based aggregation protocol consistently outperforms the gossip protocol for comparative overhead, both in terms of accuracy and robustness.
Keywords :
distributed algorithms; monitoring; protocols; crash failures; gossip protocols; network-wide aggregates; robust monitoring; tree-based aggregation protocol; Aggregates; Computer crashes; Condition monitoring; Database systems; Distributed algorithms; Error correction; Fault tolerant systems; Helium; Protocols; Robustness; Gossip protocol, epidemic protocol, aggregation, real-time monitoring;
fLanguage :
English
Journal_Title :
Network and Service Management, IEEE Transactions on
Publisher :
ieee
ISSN :
1932-4537
Type :
jour
DOI :
10.1109/TNSM.2009.090603
Filename :
5374830
Link To Document :
بازگشت