DocumentCode
2194653
Title
Sliding HyperLogLog: Estimating Cardinality in a Data Stream over a Sliding Window
Author
Chabchoub, Yousra ; Hebrail, Georges
Author_Institution
BILab, Telecom ParisTech, Paris, France
fYear
2010
fDate
13-13 Dec. 2010
Firstpage
1297
Lastpage
1303
Abstract
In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al. to data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/√m (the same as in HyperLogLog algorithm), where m is the number of registers in the required memory. As the new algorithm answers more flexible queries, it needs additional memory storage compared to HyperLogLog algorithm. It is proved that the total required memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with a memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.
Keywords
data handling; query processing; storage management; HyperLogLog algorithm; active flows; data stream processing; flexible queries; memory storage; sliding window; approximation algorithms; data stream; hashing; sliding window;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4244-9244-2
Electronic_ISBN
978-0-7695-4257-7
Type
conf
DOI
10.1109/ICDMW.2010.18
Filename
5693443
Link To Document