• DocumentCode
    2194653
  • Title

    Sliding HyperLogLog: Estimating Cardinality in a Data Stream over a Sliding Window

  • Author

    Chabchoub, Yousra ; Hebrail, Georges

  • Author_Institution
    BILab, Telecom ParisTech, Paris, France
  • fYear
    2010
  • fDate
    13-13 Dec. 2010
  • Firstpage
    1297
  • Lastpage
    1303
  • Abstract
    In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al. to data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/√m (the same as in HyperLogLog algorithm), where m is the number of registers in the required memory. As the new algorithm answers more flexible queries, it needs additional memory storage compared to HyperLogLog algorithm. It is proved that the total required memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with a memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.
  • Keywords
    data handling; query processing; storage management; HyperLogLog algorithm; active flows; data stream processing; flexible queries; memory storage; sliding window; approximation algorithms; data stream; hashing; sliding window;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-9244-2
  • Electronic_ISBN
    978-0-7695-4257-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2010.18
  • Filename
    5693443