Title :
H2RDF+: High-performance distributed joins over large-scale RDF graphs
Author :
Papailiou, Nikolaos ; Konstantinou, Ioannis ; Tsoumakos, Dimitrios ; Karras, Panagiotis ; Koziris, Nectarios
Author_Institution :
Comput. Syst. Lab., Nat. Tech. Univ. of Athens, Athens, Greece
Abstract :
The proliferation of data in RDF format calls for efficient and scalable solutions for their management. While scalability in the era of big data is a hard requirement, modern systems fail to adapt based on the complexity of the query. Current approaches do not scale well when faced with substantially complex, non-selective joins, resulting in exponential growth of execution times. In this work we present H2RDF+, an RDF store that efficiently performs distributed Merge and Sort-Merge joins over a multiple index scheme. H2RDF+ is highly scalable, utilizing distributed MapReduce processing and HBase indexes. Utilizing aggressive byte-level compression and result grouping over fast scans, it can process both complex and selective join queries in a highly efficient manner. Furthermore, it adaptively chooses for either single- or multi-machine execution based on join complexity estimated through index statistics. Our extensive evaluation demonstrates that H2RDF+ efficiently answers non-selective joins an order of magnitude faster than both current state-of-the-art distributed and centralized stores, while being only tenths of a second slower in simple queries, scaling linearly to the amount of available resources.
Keywords :
data handling; distributed processing; graph theory; query processing; H2RDF+; HBase indexes; byte-level compression; data proliferation; distributed MapReduce processing; large-scale RDF graphs; query complexity; Distributed databases; Educational institutions; Indexing; Partitioning algorithms; Resource description framework; Scalability; Distributed Indexing; Distributed Merge-Joins; HBase; MapReduce; RDF; SPARQL;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691582