Title :
Top-k query processing in probabilistic databases with non-materialized views
Author :
Dylla, M. ; Miliaraki, I. ; Theobald, Michael
Author_Institution :
Max Planck Inst. for Inf., Saarbrucken, Germany
Abstract :
We investigate a novel approach of computing confidence bounds for top-k ranking queries in probabilistic databases with non-materialized views. Unlike related approaches, we present an exact pruning algorithm for finding the top-ranked query answers according to their marginal probabilities without the need to first materialize all answer candidates via the views. Specifically, we consider conjunctive queries over multiple levels of select-project-join views, the latter of which are cast into Datalog rules which we ground in a top-down fashion directly at query processing time. To our knowledge, this work is the first to address integrated data and confidence computations for intensional query evaluations in the context of probabilistic databases by considering confidence bounds over first-order lineage formulas. We extend our query processing techniques by a tool-suite of scheduling strategies based on selectivity estimation and the expected impact on confidence bounds. Further extensions to our query processing strategies include improved top-k bounds in the case when sorted relations are available as input, as well as the consideration of recursive rules. Experiments with large datasets demonstrate significant runtime improvements of our approach compared to both exact and sampling-based top-k methods over probabilistic data.
Keywords :
DATALOG; database management systems; probability; query processing; scheduling; Datalog rule; confidence bound; confidence computation; conjunctive query; first-order lineage formula; integrated data; intensional query evaluation; marginal probability; nonmaterialized view; probabilistic database; pruning algorithm; scheduling strategy; select-project-join views; selectivity estimation; top-k bound; top-k query processing; top-k ranking query; top-ranked query answer; Grounding; Motion pictures; Probabilistic logic; Query processing; Semantics; Upper bound;
Conference_Titel :
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-4909-3
Electronic_ISBN :
1063-6382
DOI :
10.1109/ICDE.2013.6544819