• DocumentCode
    1808936
  • Title

    Distinct element counting in distributed dynamic data streams

  • Author

    Wenji Chen ; Yong Guan

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
  • fYear
    2015
  • fDate
    April 26 2015-May 1 2015
  • Firstpage
    2371
  • Lastpage
    2379
  • Abstract
    We consider a new type of distinct element counting problem in dynamic data streams, where (1) insertions and deletions of an element can appear not only in the same data stream but also in two or more different streams, (2) a deletion of a distinct element cancels out all the previous insertions of this element, and (3) a distinct element can be re-inserted after it has been deleted. Our goal is to count the number of distinct elements that were inserted but have not been deleted in a continuous data stream. We also solve this new type of distinct element counting problem in a distributed setting. This problem is motivated by several network monitoring and attack detection applications where network traffic can be modelled as single or distributed dynamic streams and the number of distinct elements in the data streams, such as unsuccessful TCP connection setup requests, is calculated to be used as an indicator to detect certain network events such as service outage and DDoS attacks. Although there are known tight bounds for distinct element counting in insertion-only data streams, no good bounds are known for it in dynamic data streams, neither for this new type of problem. None of the existing solutions for distinct element counting can solve our problem. In this paper, we will present the first solution to this problem, using a space-bounded data structure with a computation-efficient probabilistic data streaming algorithm to estimate the number of distinct elements in single or distributed dynamic data streams. We have done both theoretical analysis and experimental evaluations, using synthetic and real data traces, of our algorithm to show its effectiveness.
  • Keywords
    computer network security; transport protocols; DDoS attacks; TCP connection; attack detection applications; continuous data stream; distinct element counting; distributed dynamic data streams; distributed setting; network monitoring; network traffic; probabilistic data streaming algorithm; service outage; space bounded data structure; Computers; Data structures; Distributed databases; Estimation; Heuristic algorithms; Monitoring; Servers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Communications (INFOCOM), 2015 IEEE Conference on
  • Conference_Location
    Kowloon
  • Type

    conf

  • DOI
    10.1109/INFOCOM.2015.7218625
  • Filename
    7218625