• DocumentCode
    2934346
  • Title

    In Situ Data Provenance Capture in Spreadsheets

  • Author

    Asuncion, Hazeline U.

  • Author_Institution
    Comput. & Software Syst., Univ. of Washington, Bothell, WA, USA
  • fYear
    2011
  • fDate
    5-8 Dec. 2011
  • Firstpage
    240
  • Lastpage
    247
  • Abstract
    The capture of data provenance is a fundamentally important task in eScience. While provenance can be captured using techniques such as scientific workflows, typically these techniques do not trace internal data manipulations that occur within off-the-shelf analysis tools. Yet it is still essential to capture data provenance within such environments. This paper discusses an in situ provenance approach for spreadsheet data in MS Excel, a commonly used analysis environment among scientists. We describe the design and implementation of an Excel tool that captures provenance unobtrusively in the background, allows for user annotations, provides undo/redo functionality at various levels of task granularity, and presents the captured provenance in an accessible format to support a range of provenance queries for analysis. We also present several motivating use case scenarios and a user evaluation which suggests that our approach is both efficient and useful to scientists.
  • Keywords
    natural sciences computing; spreadsheet programs; MS Excel; escience; in situ data provenance; off-the-shelf analysis tools; scientific workflows; spreadsheets; user annotations; Context; Data analysis; Data mining; Filtering; Noise; Semantics; data provenance; in situ capture; spreadsheets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    E-Science (e-Science), 2011 IEEE 7th International Conference on
  • Conference_Location
    Stockholm
  • Print_ISBN
    978-1-4577-2163-2
  • Type

    conf

  • DOI
    10.1109/eScience.2011.41
  • Filename
    6123284