Title :
PAXQuery: Efficient Parallel Processing of Complex XQuery
Author :
Camacho-Rodriguez, Jesus ; Colazzo, Dario ; Manolescu, Ioana
Author_Institution :
Hortonworks Inc., Santa Clara, CA, USA
Abstract :
Increasing volumes of data are being produced and exchanged over the Web, in particular in tree-structured formats such as XML or JSON. This leads to a need of highly scalable algorithms and tools for processing such data, capable to take advantage of massively parallel processing platforms. This work considers the problem of efficiently parallelizing the execution of complex nested data processing, expressed in XQuery. We provide novel algorithms showing how to translate such queries into PACT, a recent framework generalizing MapReduce in particular by supporting many-input tasks. We present the first formal translation of complex XQuery algebraic expressions into PACT plans, and demonstrate experimentally the efficiency and scalability of our approach.
Keywords :
Internet; parallel processing; query processing; trees (mathematics); MapReduce; PACT; PAXQuery; Web; complex XQuery algebraic expressions; complex nested data processing; many-input task; parallel processing; tree-structured formats; Algebra; Contracts; Data models; Navigation; Optimization; Vegetation; XML; XML data management; XQuery parallelization; XQuery processing;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2015.2391110