Title :
Composing Data Parallel Code for a SPARQL Graph Engine
Author :
Castellana, Vito Giovanni ; Tumeo, Antonino ; Villa, Oreste ; Haglin, David ; Feo, John
Author_Institution :
Pacific Northwest Nat. Lab., Richland, WA, USA
Abstract :
The emergence of petascale triple stores have motivated the investigation of alternates to traditional table-based relational methods. Since triple stores represent data as structured tuples, graphs are a natural data structure for encoding their information. The use of graph data structures, rather than tables, requires us to rethink the methods used to process queries on the store. We are developing a scalable, in-memory SPARQL graph engine that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to data parallel C compiler, a library of parallel graph methods, and a custom multithreaded runtime layer for multinode commodity systems. Rather than transforming SPARQL queries into a series of select and join operations on tables, our front end compiles the queries into data parallel C code with calls to graph methods that walk internal data structures, constructing answers in their wake. In this paper, we describe the compilation process and give examples of the generated C code parallelized with OpenMP. We present performance numbers for the SP2Bench SPARQL benchmark queries on a 48-core shared-memory system. With respect to conventional relational database systems such as Virtuoso, our approach uses less memory and provides higher performance.
Keywords :
C language; SQL; data structures; multi-threading; program compilers; query processing; relational databases; OpenMP; SP2Bench SPARQL benchmark queries; SPARQL queries; Virtuoso; compilation process; data parallel C code; data parallel C compiler; data parallel code; data representation; graph data structures; in-memory SPARQL graph engine; information encoding; internal data structures; multinode commodity systems; multithreaded runtime layer; parallel graph methods; petascale triple stores; queries processing; query throughput; relational database systems; shared-memory system; structured tuples; table-based relational methods; Data structures; Databases; Engines; Libraries; Optimization; Pattern matching; Resource description framework; PGAS; SPARQL; multithreading;
Conference_Titel :
Social Computing (SocialCom), 2013 International Conference on
Conference_Location :
Alexandria, VA
DOI :
10.1109/SocialCom.2013.104