• DocumentCode
    253263
  • Title

    Highly compact virtual maximum likelihood sketches for counting big network data

  • Author

    Zhen Mo ; Yan Qiao ; Shigang Chen ; Tao Li

  • Author_Institution
    Dept. of Comput. & Inf. Sci. & Eng., Univ. of Florida, Gainesville, FL, USA
  • fYear
    2014
  • fDate
    Sept. 30 2014-Oct. 3 2014
  • Firstpage
    1188
  • Lastpage
    1195
  • Abstract
    As the Internet moves into the era of big network data, it presents both opportunities and technical challenges for traffic measurement functions, such as flow cardinality estimation, which is to estimate the number of distinct elements in each flow. Cardinality estimation has important applications in intrusion detection, resource management, billing and capacity planning, as well as big data analytics. Due to the practical need of processing network data in high volume and high speed, past research has strived to reduce the memory overhead for cardinality estimation on a large number of flows. One important thread of research in this area is based on sketches. The representative work includes the FM sketches [1], the LogLog sketches [2], and the HyperLogLog sketches [3]. Each sketch requires multiple bits and many sketches are needed for each flow, which results in significant memory overhead. This paper proposes a new method of virtual maximum likelihood sketches to reduce memory consumption. First, we design virtual sketches that use no more than two bits per sketch on average. Second, we design virtual sketch vectors that consider all flows together. Based on these new constructs, we design a flow cardinality solution with an online operation module and an offline estimation module. We also consider the problem of differentiated estimation that gives flows of high priorities better precision in their cardinality estimations. We implement the new solution and perform experiments to evaluate its performance based on real traffic traces.
  • Keywords
    Internet; telecommunication traffic; FM sketches; HyperLogLog sketches; Internet; LogLog sketches; big network data counting; differentiated estimation; flow cardinality estimation; flow cardinality solution; highly compact virtual maximum likelihood sketches; memory consumption reduction; offline estimation module; online operation module; traffic measurement functions; virtual sketch vector design; Accuracy; Arrays; Frequency modulation; Maximum likelihood estimation; Servers; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on
  • Conference_Location
    Monticello, IL
  • Type

    conf

  • DOI
    10.1109/ALLERTON.2014.7028590
  • Filename
    7028590