• DocumentCode
    3122401
  • Title

    SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases

  • Author

    Olteanu, Dan ; Huang, Jiewen ; Koch, Christoph

  • Author_Institution
    Comput. Lab., Oxford Univ., Oxford
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    640
  • Lastpage
    651
  • Abstract
    A paramount challenge in probabilistic databases is the scalable computation of confidences of tuples in query results. This paper introduces an efficient secondary-storage operator for exact computation of queries on tuple-independent probabilistic databases. We consider the conjunctive queries without self-joins that are known to be tractable on any tuple-independent database, and queries that are not tractable in general but become tractable on probabilistic databases restricted by functional dependencies. Our operator is semantically equivalent to a sequence of aggregations and can be naturally integrated into existing relational query plans. As a proof of concept, we developed an extension of the PostgreSQL 8.3.3 query engine called SPROUT. We study optimizations that push or pull our operator or parts thereof past joins. The operator employs static information, such as the query structure and functional dependencies, to decide which constituent aggregations can be evaluated together in one scan and how many scans are needed for the overall confidence computation task. A case study on the TPC-H benchmark reveals that most TPC-H queries obtained by removing aggregations can be evaluated efficiently using our operator. Experimental evaluation on probabilistic TPC-H data shows substantial efficiency improvements when compared to the state of the art.
  • Keywords
    SQL; optimisation; query processing; PostgreSQL 8.3.3 query engine; SPROUT; eager query plans; lazy query plans; optimization; secondary storage operator; tuple-independent probabilistic databases; Cleaning; Computer science; Data engineering; Engines; Laboratories; Polynomials; Probability distribution; Random variables; Relational databases; USA Councils; PostgreSQL; Probabilistic Databases; Query Evaluation; Query Optimization; SPROUT;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.123
  • Filename
    4812442