Title :
Importance-aware Bloom Filter for managing set membership queries on streaming data
Author :
Bhoraskar, R. ; Gabale, Vijay ; Kulkarni, Parag ; Kulkarni, Devdatta
Author_Institution :
Comput. Sci. & Eng., Univ. of Washington, Seattle, WA, USA
Abstract :
In this work1, we consider a set of networking applications which generate or process a continuous stream of data items, for example, a web-cache which processes a stream of web-objects. These applications often require to answer membership queries for duplicate detection on an unbounded set of data items. Two key challenges to answer such membership queries are the limited space to store the entire stream and the different importance-levels associated with different data items. For instance, a web-caches are of finite sizes and the cost of fetching objects into the cache is proportional to the size of objects. Motivated by such examples, our work focuses on developing a time and space efficient indexing and membership query scheme which takes into account importance levels of objects in a data stream. We propose Importance-aware Bloom Filter (IBF) which provides a set of insertion and deletion algorithms to handle membership queries on a data stream. Our evaluation of IBF for a synthetic as well as a real data set of a stream of Youtube videos, shows that IBF has close to 0% error in answering membership queries on important data items, and it results in 4-fold better performance in comparison to other importance-agnostic Bloom filter-based schemes. Importantly, we also find properties of IBF analytically via a Markov model based analysis. Thus, we believe, IBF provides a practical framework to balance the application-specific requirements to index and query data streams based on the data semantics.
Keywords :
Markov processes; data communication; data structures; video streaming; 4-fold better performance; Markov model; Youtube videos; application-specific requirements; continuous stream; data items; data semantics; different data items; fetching objects; importance-aware bloom filter; membership query scheme; set membership queries management; space efficient indexing; streaming data; web-caches; web-objects streaming; Arrays; Crawlers; Indexing; Limiting; Maintenance engineering; Markov processes;
Conference_Titel :
Communication Systems and Networks (COMSNETS), 2013 Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4673-5330-4
Electronic_ISBN :
978-1-4673-5329-8
DOI :
10.1109/COMSNETS.2013.6465564