DocumentCode :
2194653
Title :
Sliding HyperLogLog: Estimating Cardinality in a Data Stream over a Sliding Window
Author :
Chabchoub, Yousra ; Hebrail, Georges
Author_Institution :
BILab, Telecom ParisTech, Paris, France
fYear :
2010
fDate :
13-13 Dec. 2010
Firstpage :
1297
Lastpage :
1303
Abstract :
In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al. to data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/√m (the same as in HyperLogLog algorithm), where m is the number of registers in the required memory. As the new algorithm answers more flexible queries, it needs additional memory storage compared to HyperLogLog algorithm. It is proved that the total required memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with a memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.
Keywords :
data handling; query processing; storage management; HyperLogLog algorithm; active flows; data stream processing; flexible queries; memory storage; sliding window; approximation algorithms; data stream; hashing; sliding window;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
Type :
conf
DOI :
10.1109/ICDMW.2010.18
Filename :
5693443
Link To Document :
بازگشت