• DocumentCode
    2181103
  • Title

    Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop

  • Author

    Tanimura, Yusuke ; Matono, Akiyoshi ; Lynden, Steven ; Kojima, Isao

  • Author_Institution
    Inf. Technol. Res. Inst., Nat. Inst. of Adv. Ind. Sci. & Technol., Tsukuba, Japan
  • fYear
    2010
  • fDate
    1-6 March 2010
  • Firstpage
    251
  • Lastpage
    256
  • Abstract
    In order to effectively handle the growing amount of available RDF data, a scalable and flexible RDF data processing framework is needed. We previously proposed a Hadoop-based framework, which takes advantages of scalable and fault-tolerant distributed processing technologies, originally proposed as Google´s distributed file system and MapReduce parallel model. In this paper, we present a method extending the Pig data processing platform on top of the Hadoop infrastructure. Pig compiles programs written in a high level language, called Pig Latin, into MapReduce programs that can be executed by Hadoop. In order to support RDF, Pig was extended with the ability to load and store RDF data efficiently. Furthermore, as reasoning is an important requirement for most systems storing RDF data, support for inferring new triples using entailment rules was also added. In this paper, we describe these extensions and present an evaluation of their performance.
  • Keywords
    Java; distributed databases; fault tolerant computing; high level languages; Google; Hadoop infrastructure; MapReduce parallel model; Pig Latin; Pig data processing platform; distributed file system; entailment rules; fault tolerant distributed processing; high level language; scalable RDF data processing; Data processing; Distributed processing; File systems; Information technology; OWL; Open source software; Relational databases; Resource description framework; Scalability; Semantic Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    978-1-4244-6522-4
  • Electronic_ISBN
    978-1-4244-6521-7
  • Type

    conf

  • DOI
    10.1109/ICDEW.2010.5452704
  • Filename
    5452704