• DocumentCode
    2213624
  • Title

    Scalable RDF store based on HBase and MapReduce

  • Author

    Sun, Jianling ; Jin, Qiang

  • Author_Institution
    Dept. of Comput. Sci., Zhejiang Univ., Hangzhou, China
  • Volume
    1
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Abstract
    The growing size of Resource Description Framework (RDF) dataset requires RDF repository to be excellent scalable and highly efficient. Distributed and parallel processing model meets the urgent needs naturally. In this paper, we propose a scalable RDF store based on HBase, which is a distributed, column-oriented database modeled after Google´s Bigtable. Our approach adopts the idea of Hexastore and considers both RDF data model and HBase capability. We store RDF triples into six HBase tables (S_PO, P_SO, O_SP, PS_O, SO_P and PO_S) which covers all combinations of RDF triple patterns. And we index them with HBase provided index structure on row key. Besides presenting the storage schema, we also propose a MapReduce strategy for SPARQL Basic Graph Pattern (BGP) processing, which is suitable for our storage schema. It uses multiple MapReduce jobs to process a typical BGP. In each job, it uses a greedy method to select join key and eliminates multiple triple patterns. The evaluation result indicates that our approach works well against large RDF dataset.
  • Keywords
    distributed databases; parallel processing; HBase; Hexastore; MapReduce strategy; RDF triple patterns; SPARQL basic graph pattern processing; column-oriented database; distributed database; distributed processing model; greedy method; parallel processing model; resource description framework; scalable RDF store; Analytical models; Indexes; Irrigation; Lead; Resource description framework; Semantics; Web pages; HBase; MapReduce; RDF; SPARQL; parallel processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on
  • Conference_Location
    Chengdu
  • ISSN
    2154-7491
  • Print_ISBN
    978-1-4244-6539-2
  • Type

    conf

  • DOI
    10.1109/ICACTE.2010.5578937
  • Filename
    5578937