• DocumentCode
    47384
  • Title

    Introducing Provenance Capture into a Legacy Data System

  • Author

    Conover, Helen ; Ramachandran, R. ; Beaumont, B. ; Kulkarni, Akhil ; McEniry, Michael ; Regner, Kathryn ; Graves, Sarah

  • Author_Institution
    Inf. Technol. & Syst. Center, Univ. of Alabama in Huntsville, Huntsville, AL, USA
  • Volume
    51
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov. 2013
  • Firstpage
    5098
  • Lastpage
    5104
  • Abstract
    Accurate provenance information facilitates improved understanding of Earth science data and scientific reproducibility and can serve as an indicator of data quality. Provenance capture is an integral part of many modern workflow systems but may not have been considered in the design of legacy data production systems. Furthermore, in addition to data lineage, it is also important to capture contextual information needed for understanding how a data set was produced. This paper describes our experience in retrofitting a legacy data system to support capture, storage, and dissemination of provenance. Data inputs and transformations are logged automatically, while broader context information describing science algorithms and ancillary files is manually compiled. Provenance and context information are integrated for interactive user access and embedded into data files as XML documents compliant with the “Lineage” specification for geographic metadata defined by the International Organization for Standardization in the ISO 19115-2 standard. Lessons learned from this approach can inform others who need to incorporate provenance into a data system after the fact.
  • Keywords
    XML; geographic information systems; geophysical techniques; geophysics computing; interactive programming; meta data; Earth science data; ISO 19115-2 standard; International Organization for Standardization; XML documents; contextual information; data files; data quality indicator; data set; interactive user access; legacy data production system design; metadata; modern workflow systems; provenance capture; provenance dissemination; provenance information; provenance storage; science algorithms; Browsers; Communities; Context; Data systems; Geoscience; Software; Standards; Data management; data processing; geospatial data; metadata standards; provenance; science data systems;
  • fLanguage
    English
  • Journal_Title
    Geoscience and Remote Sensing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0196-2892
  • Type

    jour

  • DOI
    10.1109/TGRS.2013.2282817
  • Filename
    6627994