Title :
Towards Load Balancing and Parallelizing of RDF Query Processing in P2P Based Distributed RDF Data Stores
Author :
Ali, L. ; Janson, Thomas ; Schindelhauer, Christian
Author_Institution :
Univ. of Freiburg, Freiburg, Germany
Abstract :
For evaluating RDF queries in Peer-to-Peer (P2P) based RDF data stores, the location of a RDF triple in the network must be attainable from a triple pattern in the given query. An existing strategy, used by state-of-the-art distributed RDF data stores, to fulfill this requirement is to store triples at three locations that each triple can be found by the subject, predicate, and object identifier. A major drawback of this strategy is the issue of load-balancing caused by the fact that the frequency of subject, predicate, and object occurrences in triples is not uniformly distributed. While the majority of URIs and literals occur very rarely some occur very frequently (e.g., peer responsible for ´rdf:type´ is subjected to a very high storage load). In addition, this skewed RDF triples distribution among network peers also leads to an unfair query processing load distribution and long query processing time. To cope with hotspots caused by unfair data load distribution, we propose an optimized routing index scheme where triples are indexed on the combination of their subject, predicate and object components. This paper will also show how can we exploit this novel index scheme to achieve a better distribution of query processing load and faster query response time by bundling computation resources and bandwidth of peers with parallelism.
Keywords :
distributed databases; peer-to-peer computing; query processing; resource allocation; P2P based distributed RDF data stores; RDF query processing parallelization; RDF triple; computation resources; load balancing; long query processing time; network peers; object identifier; object occurrence frequency; peer-to-peer based RDF data stores; peers bandwidth; predicate frequency; resource description framework; skewed RDF triples distribution; subject frequency; unfair data load distribution; unfair query processing load distribution; Bandwidth; Distributed databases; Indexing; Peer-to-peer computing; Query processing; Resource description framework;
Conference_Titel :
Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on
Conference_Location :
Torino
DOI :
10.1109/PDP.2014.79