• DocumentCode
    141903
  • Title

    Towards optimization of RDF analytical queries on MapReduce

  • Author

    Ravindra, Padmashree

  • Author_Institution
    Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC, USA
  • fYear
    2014
  • fDate
    March 31 2014-April 4 2014
  • Firstpage
    335
  • Lastpage
    339
  • Abstract
    The broadened use of Semantic Web technologies across domains has led to a shift in focus from simple pattern matching queries on RDF data to analytical queries with complex grouping and aggregations. An RDF analytical query involves graph pattern matching, which translates to several join operations due to the fine-grained nature of RDF data model. Complex analytical queries involve multiple grouping-aggregations on different graph patterns, making such tasks join-intensive. Scale-out processing of RDF analytical queries on existing relational-style MapReduce platforms such as Apache Hive and Pig, results in lengthy execution workflows with multiple cycles of I/O and network transfer. Additionally, certain graph patterns result in avoidable redundancy in intermediate results, which negatively impacts processing costs. The PhD thesis summarized in this paper proposes a two-pronged approach to minimize the costs while processing RDF queries on MapReduce: an algebraic approach based on a Nested TripleGroup Data Model and Algebra that reinterprets graph pattern queries in a way that reduces the required number of map-reduce cycles, and special strategies to minimize the redundancy in intermediate data while processing certain graph patterns. The proposed techniques are integrated into Apache Pig. Empirical evaluation of this work for processing graph pattern queries show 45-60% performance gains over systems such as Pig and Hive.
  • Keywords
    algebra; graph theory; optimisation; pattern matching; query processing; semantic Web; Apache Pig; MapReduce; RDF analytical queries; RDF data model; algebraic approach; graph patterns; multiple grouping-aggregations; nested triplegroup data model; optimization; semantic Web technologies; simple pattern matching queries; two-pronged approach; Algebra; Data models; Optimization; Pattern matching; Redundancy; Resource description framework;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/ICDEW.2014.6818351
  • Filename
    6818351