Title :
A contingency approach to estimating record selectivities
Author_Institution :
Coll. of Bus., Ohio State Univ., Columbus, OH, USA
fDate :
6/1/1991 12:00:00 AM
Abstract :
An approach to estimating record selectivity rooted in the theory of fitting a hierarchy of models in discrete data analysis is presented. In contrast to parametric methods, this approach does not presuppose a distribution pattern to which the actual data conform; it searches for one that fits the actual data. This approach makes use of parsimonious models wherever appropriate in order to minimize the storage requirement without sacrificing accuracy. Two-dimensional cases are used as examples to illustrate the proposed method. It is demonstrated that the technique of identifying a good-fitting and parsimonious model can drastically reduce storage space and that the implementation of this technique requires little extra processing effort. The case of perfect or near-perfect association and the idea of keeping information about salient cells of a table are discussed. A strategy to reduce storage requirement in cases in which a good-fitting and parsimonious model is not available is proposed. Hierarchical models for three-dimensional cases are presented, along with a description of the W.E. Deming and F.F. Stephan (1940) iterative proportional fitting algorithm which fits hierarchical models of any dimensions
Keywords :
information retrieval systems; relational databases; storage management; contingency approach; discrete data analysis; hierarchical models; iterative proportional fitting algorithm; near-perfect association; parsimonious models; record selectivities; storage requirement; storage space; three-dimensional cases; Cost function; Data analysis; Data models; Database systems; Histograms; Information retrieval; Parameter estimation; Query processing; Relational databases; Shape;
Journal_Title :
Software Engineering, IEEE Transactions on