Title :
Path Tree: Document Synopsis for XPath Query Selectivity Estimation
Author :
Alrammal, Muath ; Hains, Gaétan ; Zergaoui, Mohamed
Author_Institution :
Innovimax SARL, Paris, France
fDate :
June 30 2011-July 2 2011
Abstract :
XML is one of the most important standards for manipulating data on the Internet. However, querying large volumes of XML data represents a bottleneck for several computationally intensive applications. A solution is to pre-process the document in streaming mode with resources approximately proportional to document depth and query selectivity. Limited processing space can then accommodate much larger documents. But the actual savings vary so much as to make them unpredictable. To overcome this limitation of stream-processing we propose a new application of the path tree synopsis data structure. Such a synopsis provides a succinct description of the original document with low computational overhead and high accuracy for processing tasks like selectivity estimation and query answer approximation. In this paper, we formally define the path tree synopsis, informally introduced by and used by, and propose a new streaming algorithm to construct it. We also present an online stream-querying system able to estimate the cost for a given query before answering it accurately. The core algorithm is adapted from LQ, we apply it to path tree traversal, cost estimation, query processing and even optimizations.
Keywords :
Internet; XML; computational complexity; query processing; tree data structures; Internet; XML; XPath query selectivity estimation; computational overhead; data manipulation; document preprocessing; online stream querying system; path tree synopsis data structure; query answer approximation; query optimization; stream processing; streaming algorithm; Accuracy; Data structures; Doped fiber amplifiers; Estimation; Q measurement; Query processing; XML;
Conference_Titel :
Complex, Intelligent and Software Intensive Systems (CISIS), 2011 International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-61284-709-2
Electronic_ISBN :
978-0-7695-4373-4
DOI :
10.1109/CISIS.2011.53