Title :
A New Method for Estimating the Number of Distinct Values over Data Streams
Author :
Guo, Longjiang ; Li, Yingshu ; Ren, Meirui ; Zhang, Zhongzhao
Author_Institution :
Electron. & Inf. Technol. Acad., Harbin Inst. of Technol., Harbin, China
Abstract :
Virtually all query optimization methods in data stream management system (DSMS) require a means of estimating the number of distinct values of an attribute in a data stream. Accurate assessment of the number of distinct values can be crucial for selecting a good query plan. Due to data streams´ continuous, real-time and unbounded characteristics, data streams may not be stored in limited memory an effective method. Therefore, estimating the number of distinct values over data streams is a more difficult problem. In this paper, combining with data streams´ properties and analyzing Bloom filter, we present a new estimation method based on circular Bloom filter using limited space. We store the distinct values in circular Bloom filter to solve effectively the problem that data streams could not be stored in limited memory. The theoretical analysis and the results of experiment indicate that the estimation method is more feasible and highly effective.
Keywords :
database management systems; information filters; query processing; circular Bloom filter; data stream management system; distinct value estimation; query optimization method; Artificial intelligence; Computer science; Distributed computing; Electronic mail; Information technology; Intelligent networks; Query processing; Software engineering; State estimation; Streaming media; BloomFilter; Data Streams; circular BloomFilter; the Number of Distinct Values;
Conference_Titel :
Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009. SNPD '09. 10th ACIS International Conference on
Conference_Location :
Daegu
Print_ISBN :
978-0-7695-3642-2
DOI :
10.1109/SNPD.2009.39