• DocumentCode
    1809386
  • Title

    Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance

  • Author

    Abraham, John ; Brazier, Pearl ; Chebotko, Artem ; Navarro, Jaime ; Piazza, Anthony

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas Pan-American, Edinburg, TX, USA
  • fYear
    2010
  • fDate
    5-10 July 2010
  • Firstpage
    178
  • Lastpage
    185
  • Abstract
    In scientific workflow environments, scientific discovery reproducibility, result interpretation, and problem diagnosis primarily depend on provenance, which records the history of an in-silico experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this paper, we research how HBase Bigtable-like capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. In particular, we architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. Using the Third Provenance Challenge queries, we conduct an experimental study to show the feasibility of our approach.
  • Keywords
    algorithm theory; data analysis; distributed databases; query processing; semantic Web; workflow management software; HBase Bigtable; ProvBase system; SPARQL query; distributed storage; open provenance model; problem diagnosis; resource description framework; result interpretation; scientific discovery; scientific workflow; semantic Web; single-machine provenance management; Distributed databases; Pattern matching; Relational databases; Resource description framework; Servers; HBase; RDF; SPARQL; Semantic Web; provenance; querying; scientific workflow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Computing (SCC), 2010 IEEE International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4244-8147-7
  • Electronic_ISBN
    978-0-7695-4126-6
  • Type

    conf

  • DOI
    10.1109/SCC.2010.14
  • Filename
    5557230