• DocumentCode
    3717334
  • Title

    LiteMat: A scalable, cost-efficient inference encoding scheme for large RDF graphs

  • Author

    Olivier Cure;Hubert Naacke;Tendry Randriamalala;Bernd Amann

  • Author_Institution
    LIP6 CNRS UMR 7606, Sorbonne Universites, UPMC Univ Paris 06, F-75005, Paris, France
  • fYear
    2015
  • Firstpage
    1823
  • Lastpage
    1830
  • Abstract
    The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while minimizing triple materialization and query rewriting. We will show how this encoding can be computed by a scalable parallel algorithm and directly be implemented over the Apache Spark framework. The efficiency of our encoding scheme is emphasized by an evaluation conducted over both synthetic and real world datasets.
  • Keywords
    "Resource description framework","Encoding","Sparks","Ontologies","Semantics","Query processing","Big data"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363955
  • Filename
    7363955