Title :
Distinct element counting in distributed dynamic data streams
Author :
Wenji Chen ; Yong Guan
Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fDate :
April 26 2015-May 1 2015
Abstract :
We consider a new type of distinct element counting problem in dynamic data streams, where (1) insertions and deletions of an element can appear not only in the same data stream but also in two or more different streams, (2) a deletion of a distinct element cancels out all the previous insertions of this element, and (3) a distinct element can be re-inserted after it has been deleted. Our goal is to count the number of distinct elements that were inserted but have not been deleted in a continuous data stream. We also solve this new type of distinct element counting problem in a distributed setting. This problem is motivated by several network monitoring and attack detection applications where network traffic can be modelled as single or distributed dynamic streams and the number of distinct elements in the data streams, such as unsuccessful TCP connection setup requests, is calculated to be used as an indicator to detect certain network events such as service outage and DDoS attacks. Although there are known tight bounds for distinct element counting in insertion-only data streams, no good bounds are known for it in dynamic data streams, neither for this new type of problem. None of the existing solutions for distinct element counting can solve our problem. In this paper, we will present the first solution to this problem, using a space-bounded data structure with a computation-efficient probabilistic data streaming algorithm to estimate the number of distinct elements in single or distributed dynamic data streams. We have done both theoretical analysis and experimental evaluations, using synthetic and real data traces, of our algorithm to show its effectiveness.
Keywords :
computer network security; transport protocols; DDoS attacks; TCP connection; attack detection applications; continuous data stream; distinct element counting; distributed dynamic data streams; distributed setting; network monitoring; network traffic; probabilistic data streaming algorithm; service outage; space bounded data structure; Computers; Data structures; Distributed databases; Estimation; Heuristic algorithms; Monitoring; Servers;
Conference_Titel :
Computer Communications (INFOCOM), 2015 IEEE Conference on
Conference_Location :
Kowloon
DOI :
10.1109/INFOCOM.2015.7218625