Title :
Introducing Provenance Capture into a Legacy Data System
Author :
Conover, Helen ; Ramachandran, R. ; Beaumont, B. ; Kulkarni, Akhil ; McEniry, Michael ; Regner, Kathryn ; Graves, Sarah
Author_Institution :
Inf. Technol. & Syst. Center, Univ. of Alabama in Huntsville, Huntsville, AL, USA
Abstract :
Accurate provenance information facilitates improved understanding of Earth science data and scientific reproducibility and can serve as an indicator of data quality. Provenance capture is an integral part of many modern workflow systems but may not have been considered in the design of legacy data production systems. Furthermore, in addition to data lineage, it is also important to capture contextual information needed for understanding how a data set was produced. This paper describes our experience in retrofitting a legacy data system to support capture, storage, and dissemination of provenance. Data inputs and transformations are logged automatically, while broader context information describing science algorithms and ancillary files is manually compiled. Provenance and context information are integrated for interactive user access and embedded into data files as XML documents compliant with the “Lineage” specification for geographic metadata defined by the International Organization for Standardization in the ISO 19115-2 standard. Lessons learned from this approach can inform others who need to incorporate provenance into a data system after the fact.
Keywords :
XML; geographic information systems; geophysical techniques; geophysics computing; interactive programming; meta data; Earth science data; ISO 19115-2 standard; International Organization for Standardization; XML documents; contextual information; data files; data quality indicator; data set; interactive user access; legacy data production system design; metadata; modern workflow systems; provenance capture; provenance dissemination; provenance information; provenance storage; science algorithms; Browsers; Communities; Context; Data systems; Geoscience; Software; Standards; Data management; data processing; geospatial data; metadata standards; provenance; science data systems;
Journal_Title :
Geoscience and Remote Sensing, IEEE Transactions on
DOI :
10.1109/TGRS.2013.2282817