Author_Institution :
Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA
Abstract :
There has been a long history of finding a spaceefficient data structure to support approximate membership queries, started from Bloom´s work in the 1970´s. Given a set A of n items and an additional item x from the same universe U of a size m ≫ n, we want to distinguish whether x ∈ A or not, using small (limited) space. The solutions for the membership query are needed for many network applications, such as cache directory, load-balancing, security, etc. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1+ o(1))n log 1/δ bits, which only allows for a small false positive δ but no false negative. However, existing optimal algorithms are not practical for many Internet applications, e.g., social network services, peer-to-peer systems, network traffic monitoring, etc. They are too spaceand time-expensive due to the frequent changes in the set A, because all items are needed to recompute the optimal data structure for each change using a linear running time. In this paper, we propose a novel data structure to support the approximate membership query in the time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size w ≤ n. Our data structure only requires O(n(log 1/δ+logn)) bits and O(1) running time. We also prove a non-trivial space lower bound, i.e. (n - δm) log(n - δm) bits, which guarantees that our data structure is near-optimal. Our data structure has been evaluated using both synthetic and real data sets.
Keywords :
Internet; computational complexity; data structures; query processing; randomised algorithms; resource allocation; Internet applications; cache directory; data stream; linear running time; load-balancing; near-optimal approximate membership query; network applications; network traffic monitoring; nontrivial space lower bound; optimal algorithms; peer-to-peer systems; randomized data structure; social network services; space-efficient data structure; time-decaying window model; Algorithm design and analysis; Data models; Data structures; Dictionaries; Internet; Radiation detectors; Security;