• DocumentCode
    2934424
  • Title

    Dynamic Pipeline Changes in Scientific Data Processing

  • Author

    Mwebaze, Johnson ; Boxhoorn, Danny ; Valentijn, Edwin

  • Author_Institution
    Kapteyn Astron. Inst., Univ. of Groningen, Groningen, Netherlands
  • fYear
    2011
  • fDate
    5-8 Dec. 2011
  • Firstpage
    263
  • Lastpage
    270
  • Abstract
    Understanding the difference between data objects is a major problem especially in a scientific collaboration which allows scientists to collectively reuse data, modify and adapt scripts developed by their peers to process data while publishing the results to a centralized data store. Although data provenance has been significantly studied to address the origins of a data item, it does not however addresses changes made to the source code. Systems often appear as a large number of modules each containing hundreds of lines of code. It is, in general, not obvious which parts of source code contributed to the change in data object. The paper introduces the Class-Based Object Versioning framework, which overcomes some of the shortcomings of popular versioning systems (e.g. CVS, SVN) in maintaining data and code provenance information in scientific computing environments. The framework automatically identifies and captures useful fine-grained changes in the data and code of scripts that perform scientific experiments so that important information about intermediate stages (i.e. unrecorded changes in experiment parameters and procedures) can be identified and analyzed.
  • Keywords
    configuration management; data handling; object-oriented programming; scientific information systems; class-based object versioning framework; code provenance information; data provenance; dynamic pipeline; scientific collaboration; scientific computing environment; scientific data processing; source code; Databases; Joining processes; Object recognition; Pipelines; Publishing; Semantics; Software; Astro-WISE; data provenance; object versioning; scientific computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    E-Science (e-Science), 2011 IEEE 7th International Conference on
  • Conference_Location
    Stockholm
  • Print_ISBN
    978-1-4577-2163-2
  • Type

    conf

  • DOI
    10.1109/eScience.2011.44
  • Filename
    6123287