DocumentCode
141903
Title
Towards optimization of RDF analytical queries on MapReduce
Author
Ravindra, Padmashree
Author_Institution
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
fYear
2014
fDate
March 31 2014-April 4 2014
Firstpage
335
Lastpage
339
Abstract
The broadened use of Semantic Web technologies across domains has led to a shift in focus from simple pattern matching queries on RDF data to analytical queries with complex grouping and aggregations. An RDF analytical query involves graph pattern matching, which translates to several join operations due to the fine-grained nature of RDF data model. Complex analytical queries involve multiple grouping-aggregations on different graph patterns, making such tasks join-intensive. Scale-out processing of RDF analytical queries on existing relational-style MapReduce platforms such as Apache Hive and Pig, results in lengthy execution workflows with multiple cycles of I/O and network transfer. Additionally, certain graph patterns result in avoidable redundancy in intermediate results, which negatively impacts processing costs. The PhD thesis summarized in this paper proposes a two-pronged approach to minimize the costs while processing RDF queries on MapReduce: an algebraic approach based on a Nested TripleGroup Data Model and Algebra that reinterprets graph pattern queries in a way that reduces the required number of map-reduce cycles, and special strategies to minimize the redundancy in intermediate data while processing certain graph patterns. The proposed techniques are integrated into Apache Pig. Empirical evaluation of this work for processing graph pattern queries show 45-60% performance gains over systems such as Pig and Hive.
Keywords
algebra; graph theory; optimisation; pattern matching; query processing; semantic Web; Apache Pig; MapReduce; RDF analytical queries; RDF data model; algebraic approach; graph patterns; multiple grouping-aggregations; nested triplegroup data model; optimization; semantic Web technologies; simple pattern matching queries; two-pronged approach; Algebra; Data models; Optimization; Pattern matching; Redundancy; Resource description framework;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on
Conference_Location
Chicago, IL
Type
conf
DOI
10.1109/ICDEW.2014.6818351
Filename
6818351
Link To Document