• DocumentCode
    2224136
  • Title

    Recording and using provenance in a protein compressibility experiment

  • Author

    Groth, Paul ; Miles, Simon ; Fang, Weijian ; Wong, Sylvia C. ; Zauner, Klaus-Peter ; Moreau, Luc

  • Author_Institution
    Sch. of Electron. & Comput. Sci., Southampton Univ., UK
  • fYear
    2005
  • fDate
    24-27 July 2005
  • Firstpage
    201
  • Lastpage
    208
  • Abstract
    Very large scale computations are now becoming routinely used as a methodology to undertake scientific research. In this context, ´provenance systems´ are regarded as the equivalent of the scientist´s logbook for in silico experimentation: provenance captures the documentation of the process that led to some result. Using a protein compressibility analysis application, we derive a set of generic use cases for a provenance system. In order to support these, we address the following fundamental questions: what is provenance? How to record it? What is the performance impact for grid execution? What is the performance of reasoning? In doing so, we define a technology-independent notion of provenance that captures interactions between components, internal component information and grouping of interactions, so as to allow us to analyze and reason about the execution of scientific processes. In order to support persistent provenance in heterogeneous applications, we introduce a separate provenance store, in which provenance documentation can be stored, archived and queried independently of the technology used to run the application. Through a series of practical tests, we evaluate the performance impact of such a provenance system. In summary, we demonstrate that provenance recording overhead of our prototype system remains under 10% of execution time, and we show that the recorded information successfully supports our use cases in a performant manner.
  • Keywords
    biology computing; grid computing; proteins; reasoning about programs; scientific information systems; component interaction; grid execution; heterogeneous applications; in silico experimentation; interaction grouping; internal component information; process documentation; protein compressibility experiment; provenance documentation; provenance recording; provenance systems; reasoning; scientific process; scientific research; scientist logbook; Bioinformatics; Computer science; Documentation; Independent component analysis; Information analysis; Large-scale systems; Magnetic heads; Physics computing; Proteins; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium on
  • ISSN
    1082-8907
  • Print_ISBN
    0-7803-9037-7
  • Type

    conf

  • DOI
    10.1109/HPDC.2005.1520960
  • Filename
    1520960