• DocumentCode
    1625444
  • Title

    End-biased Samples for Join Cardinality Estimation

  • Author

    Estan, Cristian ; Naughton, Jeffrey F.

  • Author_Institution
    University of Wisconsin-Madison
  • fYear
    2006
  • Firstpage
    20
  • Lastpage
    20
  • Abstract
    We present a new technique for using samples to estimate join cardinalities. This technique, which we term "end-biased samples," is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate estimates than random samples with the same sample size. The comparison with histograms is more interesting ― while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed "skimmed sketches" and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.
  • Keywords
    Data engineering; Frequency; Histograms; Monitoring; Multidimensional systems; Probability distribution; Sampling methods; Telecommunication traffic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
  • Print_ISBN
    0-7695-2570-9
  • Type

    conf

  • DOI
    10.1109/ICDE.2006.61
  • Filename
    1617388