• DocumentCode
    610364
  • Title

    Optimizing approximations of DNF query lineage in probabilistic XML

  • Author

    Souihli, A. ; Senellart, P.

  • Author_Institution
    LTCI, Telecom ParisTech, Paris, France
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    721
  • Lastpage
    732
  • Abstract
    Probabilistic XML is a probabilistic model for uncertain tree-structured data, with applications to data integration, information extraction, or uncertain version control. We explore in this work efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for listing the answers to a query along with their computed or approximated probability. The approach relies on, first, producing the lineage query by evaluating it over the probabilistic XML document, and, second, looking for an optimal strategy to compute the probability of the lineage formula. This latter part relies on a query-optimizer - like approach: exploring different evaluation plans for different parts of the formula and estimating the cost of each plan, using a cost model for the various evaluation algorithms. We demonstrate the efficiency of this approach on datasets used in previous research on probabilistic XML querying, as well as on synthetic data. We also compare the performance of our query engine with EvalDP [1], Trio [2], and MayBMS/SPROUT [3].
  • Keywords
    XML; information retrieval; tree data structures; DNF query lineage; EvalDP; MayBMS/SPROUT; Trio; approximation optimization; cost model; data integration; information extraction; lineage query; optimal strategy; probabilistic XML document; probabilistic XML querying; probabilistic model; query engine; query-optimizer; tree-pattern queries; uncertain tree-structured data; uncertain version control; Additives; Approximation algorithms; Approximation methods; Computational modeling; Data models; Probabilistic logic; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544869
  • Filename
    6544869