• DocumentCode
    2494646
  • Title

    Sampling-based selectivity estimation for joins using augmented frequent value statistics

  • Author

    Haas, Peter J. ; Swami, Arun N.

  • Author_Institution
    IBM Almaden Res. Center, San Jose, CA, USA
  • fYear
    1995
  • fDate
    6-10 Mar 1995
  • Firstpage
    522
  • Lastpage
    531
  • Abstract
    We compare empirically the cost of estimating the selectivity of a star join using the sampling-based t-cross procedure to the cost of computing the join and obtaining the exact answer. The relative cost of sampling can be excessive when a join attribute value exhibits “heterogeneous skew.” To alleviate this problem, we propose Algorithm TCM, a modified version of t-cross that incorporates “augmented frequent value” (AFV) statistics. We provide a sampling-based method for estimating AFV statistics that does not require indexes on attribute values, requires only one pass though each relation, and uses an amount of memory much smaller than the size of a relation. Our experiments show that the use of estimated AFV statistics can reduce the relative cost of sampling by orders of magnitude. We also show that use of estimated AFV statistics can reduce the relative error of the classical System R selectivity formula
  • Keywords
    query processing; relational databases; augmented frequent value statistics; heterogeneous skew; join attribute value; sampling-based selectivity estimation; sampling-based t-cross procedure; star join; Capacity planning; Concatenated codes; Cost function; Error analysis; Error correction; Query processing; Relational databases; Sampling methods; Silicon; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 1995. Proceedings of the Eleventh International Conference on
  • Conference_Location
    Taipei
  • Print_ISBN
    0-8186-6910-1
  • Type

    conf

  • DOI
    10.1109/ICDE.1995.380361
  • Filename
    380361