DocumentCode
253263
Title
Highly compact virtual maximum likelihood sketches for counting big network data
Author
Zhen Mo ; Yan Qiao ; Shigang Chen ; Tao Li
Author_Institution
Dept. of Comput. & Inf. Sci. & Eng., Univ. of Florida, Gainesville, FL, USA
fYear
2014
fDate
Sept. 30 2014-Oct. 3 2014
Firstpage
1188
Lastpage
1195
Abstract
As the Internet moves into the era of big network data, it presents both opportunities and technical challenges for traffic measurement functions, such as flow cardinality estimation, which is to estimate the number of distinct elements in each flow. Cardinality estimation has important applications in intrusion detection, resource management, billing and capacity planning, as well as big data analytics. Due to the practical need of processing network data in high volume and high speed, past research has strived to reduce the memory overhead for cardinality estimation on a large number of flows. One important thread of research in this area is based on sketches. The representative work includes the FM sketches [1], the LogLog sketches [2], and the HyperLogLog sketches [3]. Each sketch requires multiple bits and many sketches are needed for each flow, which results in significant memory overhead. This paper proposes a new method of virtual maximum likelihood sketches to reduce memory consumption. First, we design virtual sketches that use no more than two bits per sketch on average. Second, we design virtual sketch vectors that consider all flows together. Based on these new constructs, we design a flow cardinality solution with an online operation module and an offline estimation module. We also consider the problem of differentiated estimation that gives flows of high priorities better precision in their cardinality estimations. We implement the new solution and perform experiments to evaluate its performance based on real traffic traces.
Keywords
Internet; telecommunication traffic; FM sketches; HyperLogLog sketches; Internet; LogLog sketches; big network data counting; differentiated estimation; flow cardinality estimation; flow cardinality solution; highly compact virtual maximum likelihood sketches; memory consumption reduction; offline estimation module; online operation module; traffic measurement functions; virtual sketch vector design; Accuracy; Arrays; Frequency modulation; Maximum likelihood estimation; Servers; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on
Conference_Location
Monticello, IL
Type
conf
DOI
10.1109/ALLERTON.2014.7028590
Filename
7028590
Link To Document