DocumentCode
1808936
Title
Distinct element counting in distributed dynamic data streams
Author
Wenji Chen ; Yong Guan
Author_Institution
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
fYear
2015
fDate
April 26 2015-May 1 2015
Firstpage
2371
Lastpage
2379
Abstract
We consider a new type of distinct element counting problem in dynamic data streams, where (1) insertions and deletions of an element can appear not only in the same data stream but also in two or more different streams, (2) a deletion of a distinct element cancels out all the previous insertions of this element, and (3) a distinct element can be re-inserted after it has been deleted. Our goal is to count the number of distinct elements that were inserted but have not been deleted in a continuous data stream. We also solve this new type of distinct element counting problem in a distributed setting. This problem is motivated by several network monitoring and attack detection applications where network traffic can be modelled as single or distributed dynamic streams and the number of distinct elements in the data streams, such as unsuccessful TCP connection setup requests, is calculated to be used as an indicator to detect certain network events such as service outage and DDoS attacks. Although there are known tight bounds for distinct element counting in insertion-only data streams, no good bounds are known for it in dynamic data streams, neither for this new type of problem. None of the existing solutions for distinct element counting can solve our problem. In this paper, we will present the first solution to this problem, using a space-bounded data structure with a computation-efficient probabilistic data streaming algorithm to estimate the number of distinct elements in single or distributed dynamic data streams. We have done both theoretical analysis and experimental evaluations, using synthetic and real data traces, of our algorithm to show its effectiveness.
Keywords
computer network security; transport protocols; DDoS attacks; TCP connection; attack detection applications; continuous data stream; distinct element counting; distributed dynamic data streams; distributed setting; network monitoring; network traffic; probabilistic data streaming algorithm; service outage; space bounded data structure; Computers; Data structures; Distributed databases; Estimation; Heuristic algorithms; Monitoring; Servers;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Communications (INFOCOM), 2015 IEEE Conference on
Conference_Location
Kowloon
Type
conf
DOI
10.1109/INFOCOM.2015.7218625
Filename
7218625
Link To Document