DocumentCode :
1791796
Title :
Optimizing graph queries with graph joins and Sprinkle SPARQL
Author :
Goodman, Eric L. ; Jimenez, Edward ; Al-Saffar, Sinan ; Joslyn, Cliff ; Haglin, David ; Grunwald, Dirk
Author_Institution :
Sandia Nat. Labs., Albuquerque, NM, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
17
Lastpage :
24
Abstract :
Big data problems are often more akin to sparse graphs rather than relational tables. As such we argue that graph-based physical representations provide advantages in terms of both size and speed for executing queries. Drawing from research in sparse matrices, we use a compressed sparse row (CSR) format to model graph-oriented data. We also present two novel mechanisms for exploiting the CSR format that both find optimal join strategies and also prune variable bindings before expensive join operations occur. The first tactic we call Sprinkle SPARQL, which takes triple patterns of SPARQL queries and performs low-cost, linear-time set intersections to produce a constrained list of variable bindings for each variable in a query. Besides constrained lists of variable bindings, Sprinkle SPARQL also produces metrics that are consumed by the join algorithm to select an optimal execution path. The second tactic, graph joins, utilizes the CSR data structure as an index to efficiently join two variables expressed in a triple pattern together. We evaluate our approach on two data sets with over a billion edges: LUBM(8000) and an R-MAT graph generated with Graph5001 parameters and extended to have edge labels.
Keywords :
Big Data; graph theory; query processing; sparse matrices; Big Data problems; CSR data structure; CSR format; Graph500 parameters; LUBM(8000); R-MAT graph; Sprinkle SPARQL; compressed sparse row format; graph joins; graph query optimization; graph-based physical representations; graph-oriented data modelling; linear-time set intersections; optimal execution path; optimal join strategies; relational tables; sparse graphs; sparse matrices; variable binding pruning; Arrays; Dictionaries; Educational institutions; Indexes; Resource description framework; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004463
Filename :
7004463
Link To Document :
بازگشت