DocumentCode :
897458
Title :
Domains and active domains: what this distinction implies for the estimation of projection sizes in relational databases
Author :
Ciaccia, Paolo ; Maio, Dario
Author_Institution :
Dipartimento di Elettronica, Inf. e Sistemistica, Bologna Univ., Italy
Volume :
7
Issue :
4
fYear :
1995
fDate :
8/1/1995 12:00:00 AM
Firstpage :
641
Lastpage :
655
Abstract :
Database optimizers require statistical information about data distributions in order to evaluate result sizes and access plan costs for processing user queries. In this context, we consider the problem of estimating the size of the projections of a database relation, when measures on attribute domain cardinalities are maintained in the system. Our main theoretical contribution is a new formal model, the AD (active domain) model, which is valid under the hypotheses of attribute independence and uniform distribution of attribute values, derived considering the difference between the time-invariant domain (the set of values that an attribute can assume) and the time-dependent (“active”) domain (the set of values that are actually assumed, at a certain time). Early models developed under the same assumptions are shown to be formally incorrect. Since the AD model is computationally highly demanding, we also introduce an approximate, easy-to-compute model, the A2D (approximate active domain) model that, unlike previous approximations, yields low errors on all the parameter space of the active domain cardinalities. Finally, we extend the A2D model to the case of nonuniform distributions and present experimental results confirming the good behavior of the model
Keywords :
active databases; database theory; error statistics; query processing; relational databases; A2D model; AD model; active domains; approximate active domain model; attribute domain cardinalities; attribute independence; combinatorial models; data distributions; database optimizers; error estimate; nonuniform distributions; parameter space errors; plan costs; projection size estimation; query optimization; relational databases; statistical information; statistical profile; time-dependent domain; time-invariant domain; uniform attribute values distribution; user query processing; Aggregates; Computational modeling; Cost function; Estimation error; Histograms; Query processing; Relational databases; Size measurement;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/69.404035
Filename :
404035
Link To Document :
بازگشت