Title :
On the Impact of Data Distribution in Federated SPARQL Queries
Author :
Rakhmawati, Nur Aini ; Hausenblas, Michael
Author_Institution :
Digital Enterprise Res. Inst., Nat. Univ. of Ireland, Galway, Galway, Ireland
Abstract :
With the growing number of publicly available SPARQL endpoints, federated queries become more and more attractive and feasible. Compared to queries against a single endpoint, queries that range over a number of endpoints pose new challenges, ranging from the type and number of datasets involved to the data distribution across the datasets. Existing research focuses on the data distribution in a central store and is mainly concerned with adopting well-known, traditional database techniques. In this work we investigate the impact of the data distribution in the context of federated SPARQL queries.We perform a number of experiments with four federation frameworks (Sesame Alibaba, Splendid, FedX, and Darq) against an RDF dataset, Dailymed, that we partition by graph and class.Our preliminary results confirm the intuition that the more datasets involved in query processing, the worse performance of federation query is and that the data distribution significantly influences the performance.
Keywords :
data handling; database management systems; graph theory; query languages; query processing; Dailymed; Darq; FedX; RDF dataset; SPARQL endpoints; Sesame Alibaba; Splendid; class; data distribution impact; database techniques; federated SPARQL query; graph; query processing; Catalogs; Coherence; Distributed databases; Query processing; Resource description framework; Time factors;
Conference_Titel :
Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on
Conference_Location :
Palermo
Print_ISBN :
978-1-4673-4433-3
DOI :
10.1109/ICSC.2012.72