• DocumentCode
    1879187
  • Title

    A New Method for Estimating the Number of Distinct Values over Data Streams

  • Author

    Guo, Longjiang ; Li, Yingshu ; Ren, Meirui ; Zhang, Zhongzhao

  • Author_Institution
    Electron. & Inf. Technol. Acad., Harbin Inst. of Technol., Harbin, China
  • fYear
    2009
  • fDate
    27-29 May 2009
  • Firstpage
    71
  • Lastpage
    76
  • Abstract
    Virtually all query optimization methods in data stream management system (DSMS) require a means of estimating the number of distinct values of an attribute in a data stream. Accurate assessment of the number of distinct values can be crucial for selecting a good query plan. Due to data streams´ continuous, real-time and unbounded characteristics, data streams may not be stored in limited memory an effective method. Therefore, estimating the number of distinct values over data streams is a more difficult problem. In this paper, combining with data streams´ properties and analyzing Bloom filter, we present a new estimation method based on circular Bloom filter using limited space. We store the distinct values in circular Bloom filter to solve effectively the problem that data streams could not be stored in limited memory. The theoretical analysis and the results of experiment indicate that the estimation method is more feasible and highly effective.
  • Keywords
    database management systems; information filters; query processing; circular Bloom filter; data stream management system; distinct value estimation; query optimization method; Artificial intelligence; Computer science; Distributed computing; Electronic mail; Information technology; Intelligent networks; Query processing; Software engineering; State estimation; Streaming media; BloomFilter; Data Streams; circular BloomFilter; the Number of Distinct Values;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009. SNPD '09. 10th ACIS International Conference on
  • Conference_Location
    Daegu
  • Print_ISBN
    978-0-7695-3642-2
  • Type

    conf

  • DOI
    10.1109/SNPD.2009.39
  • Filename
    5286690