Title of article :
Processing XML streams with deterministic automata and stream indexes
Author/Authors :
Green، Todd J. نويسنده , , Gupta، Ashish نويسنده , , Miklau، Gerome نويسنده , , Onizuka، Makoto نويسنده , , Suciu، Dan نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2000
Pages :
-751
From page :
752
To page :
0
Abstract :
We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently of the number of XPath expressions. The major issue is the size of the DFA, which, in theory, can be exponential in the number of XPath expressions. We provide a series of theoretical results and experimental evaluations that show that the lazy DFA has a small number of states, for all practical purposes. These results are of general interest in XPath processing, beyond stream processing. The second technique is the Streaming IndeX (SIX), which consists of adding a small amount of binary data to each XML packet that allows the query processor to achieve significant speedups. As an application of these techniques we describe the XML Toolkit (XMLTK), a collection of command-line tools providing highly scalable XML data processing.
Keywords :
XML processing , stream processing
Journal title :
A C M Transactions on Database Systems
Serial Year :
2000
Journal title :
A C M Transactions on Database Systems
Record number :
2649
Link To Document :
بازگشت