DocumentCode :
2724765
Title :
Parallel XML Parsing Using Meta-DFAs
Author :
Pan, Yinfei ; Zhang, Ying ; Chiu, Kenneth ; Lu, Wei
Author_Institution :
State Univ. of New York -Binghamton, Binghamton
fYear :
2007
fDate :
10-13 Dec. 2007
Firstpage :
237
Lastpage :
244
Abstract :
By leveraging the growing prevalence of multicore CPUs, parallel XML parsing(PXP) can significantly improve the performance of XML, enhancing its suitability for scientific data which is often dominated by floating-point numbers. One approach is to divide the XML document into equal-sized chunks, and parse each chunk in parallel. XML parsing is inherently sequential, however, because the state of an XML parser when reading a given character depends potentially on all preceding characters. In previous work, we addressed this by using a fast preparsing scan to build an outline of the document which we called the skeleton. The skeleton is then used to guide the parallel full parse. The preparse is a sequential phase that limits scalability, however, and so in this paper, we show how the preparse itself can be parallelized using a mechanism we call a meta-DFA. For each state q of the original preparser the meta-DFA incorporates a complete copy of the preparser state machine as a sub-DFA which starts in state q. The meta-DFA thus runs multiple instances of the preparser simultaneously when parsing a chunk, with each possible preparser state at the beginning of a chunk represented by an instance. By pursuing all possibilities simultaneously, the meta-DFA allows each chunk to be preparsed independently in parallel. The parallel full parse following the preparse is performed using libxml2, and outputs DOM trees that are fully compatible with existing applications that use libxml2. Our implementation scales well on a 30 CPU Sun E6500 machine.
Keywords :
XML; finite state machines; parallelising compilers; tree data structures; DOM trees; XML document; deterministic finite state automata; floating-point numbers; libxml2 preparser; meta-DFA; parallel XML parsing; preparser state machine; Application software; Computer science; Grid computing; Hardware; Multicore processing; Parallel processing; Scalability; Skeleton; Sun; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
e-Science and Grid Computing, IEEE International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-0-7695-3064-2
Type :
conf
DOI :
10.1109/E-SCIENCE.2007.55
Filename :
4426893
Link To Document :
بازگشت