Title :
Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters
Author :
Enk, A. ; Valenta, M. ; Benn, W.
Author_Institution :
Czech Tech. Univ. FIT, Prague, Czech Republic
Abstract :
The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework. First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.
Keywords :
Big Data; XML; distributed processing; document handling; query processing; MR framework; MapReduce clusters; XML documents; XPath; XQuery; big data processing; commodity computer cluster; database systems; declarative manner; distributed XPath axes queries evaluation; parallel computation; query processor; wide-column store; Computers; Data models; Indexes; Query processing; Scalability; XML;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
Conference_Location :
Munich
Print_ISBN :
978-1-4799-5721-7
DOI :
10.1109/DEXA.2014.59