• DocumentCode
    3143508
  • Title

    Deriving probabilistic databases with inference ensembles

  • Author

    Stoyanovich, Julia ; Davidson, Susan ; Milo, Tova ; Tannen, Val

  • Author_Institution
    Univ. of Pennsylvania, Philadelphia, PA, USA
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    303
  • Lastpage
    314
  • Abstract
    Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.
  • Keywords
    inference mechanisms; statistical databases; statistical distributions; Gibbs sampling; MRSL; inference ensemble algorithm; meta-rule semilattices; missing data; probabilistic database model; probability distributions; tuple collection; Accuracy; Association rules; Computational modeling; Itemsets; Probabilistic logic; Probability distribution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767854
  • Filename
    5767854