DocumentCode
1879187
Title
A New Method for Estimating the Number of Distinct Values over Data Streams
Author
Guo, Longjiang ; Li, Yingshu ; Ren, Meirui ; Zhang, Zhongzhao
Author_Institution
Electron. & Inf. Technol. Acad., Harbin Inst. of Technol., Harbin, China
fYear
2009
fDate
27-29 May 2009
Firstpage
71
Lastpage
76
Abstract
Virtually all query optimization methods in data stream management system (DSMS) require a means of estimating the number of distinct values of an attribute in a data stream. Accurate assessment of the number of distinct values can be crucial for selecting a good query plan. Due to data streams´ continuous, real-time and unbounded characteristics, data streams may not be stored in limited memory an effective method. Therefore, estimating the number of distinct values over data streams is a more difficult problem. In this paper, combining with data streams´ properties and analyzing Bloom filter, we present a new estimation method based on circular Bloom filter using limited space. We store the distinct values in circular Bloom filter to solve effectively the problem that data streams could not be stored in limited memory. The theoretical analysis and the results of experiment indicate that the estimation method is more feasible and highly effective.
Keywords
database management systems; information filters; query processing; circular Bloom filter; data stream management system; distinct value estimation; query optimization method; Artificial intelligence; Computer science; Distributed computing; Electronic mail; Information technology; Intelligent networks; Query processing; Software engineering; State estimation; Streaming media; BloomFilter; Data Streams; circular BloomFilter; the Number of Distinct Values;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009. SNPD '09. 10th ACIS International Conference on
Conference_Location
Daegu
Print_ISBN
978-0-7695-3642-2
Type
conf
DOI
10.1109/SNPD.2009.39
Filename
5286690
Link To Document