DocumentCode
47384
Title
Introducing Provenance Capture into a Legacy Data System
Author
Conover, Helen ; Ramachandran, R. ; Beaumont, B. ; Kulkarni, Akhil ; McEniry, Michael ; Regner, Kathryn ; Graves, Sarah
Author_Institution
Inf. Technol. & Syst. Center, Univ. of Alabama in Huntsville, Huntsville, AL, USA
Volume
51
Issue
11
fYear
2013
fDate
Nov. 2013
Firstpage
5098
Lastpage
5104
Abstract
Accurate provenance information facilitates improved understanding of Earth science data and scientific reproducibility and can serve as an indicator of data quality. Provenance capture is an integral part of many modern workflow systems but may not have been considered in the design of legacy data production systems. Furthermore, in addition to data lineage, it is also important to capture contextual information needed for understanding how a data set was produced. This paper describes our experience in retrofitting a legacy data system to support capture, storage, and dissemination of provenance. Data inputs and transformations are logged automatically, while broader context information describing science algorithms and ancillary files is manually compiled. Provenance and context information are integrated for interactive user access and embedded into data files as XML documents compliant with the “Lineage” specification for geographic metadata defined by the International Organization for Standardization in the ISO 19115-2 standard. Lessons learned from this approach can inform others who need to incorporate provenance into a data system after the fact.
Keywords
XML; geographic information systems; geophysical techniques; geophysics computing; interactive programming; meta data; Earth science data; ISO 19115-2 standard; International Organization for Standardization; XML documents; contextual information; data files; data quality indicator; data set; interactive user access; legacy data production system design; metadata; modern workflow systems; provenance capture; provenance dissemination; provenance information; provenance storage; science algorithms; Browsers; Communities; Context; Data systems; Geoscience; Software; Standards; Data management; data processing; geospatial data; metadata standards; provenance; science data systems;
fLanguage
English
Journal_Title
Geoscience and Remote Sensing, IEEE Transactions on
Publisher
ieee
ISSN
0196-2892
Type
jour
DOI
10.1109/TGRS.2013.2282817
Filename
6627994
Link To Document