DocumentCode
9933
Title
A Distributed Information Divergence Estimation over Data Streams
Author
Anceaume, Emmanuelle ; Busnel, Yann
Author_Institution
IRISA, Rennes, France
Volume
25
Issue
2
fYear
2014
fDate
Feb. 2014
Firstpage
478
Lastpage
487
Abstract
In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an (ε, δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε2) bits in "most" cases, and Õ(1/ε + (n-ε-1)/ε2) otherwise, where n is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most O (rℓ (log n + 1)) bits of communication between the ℓ participating nodes, where r is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases.
Keywords
computational complexity; AnKLe; Kullback-Leibler divergence; data streams; distributed information divergence estimation; distributed systems; information theoretic methods; space complexities; time complexities; Algorithm design and analysis; Approximation algorithms; Computational modeling; Data models; Entropy; Estimation; Radiation detectors; Data stream; Kullback-Leibler divergence; byzantine adversary; performance analysis; randomized approximation algorithm;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/TPDS.2013.101
Filename
6494567
Link To Document