• DocumentCode
    3122683
  • Title

    Differencing Provenance in Scientific Workflows

  • Author

    Bao, Zhuowei ; Cohen-Boulakia, Sarah ; Davidson, Susan B. ; Eyal, Anat ; Khanna, Sanjeev

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Pennsylvania, Philadelphia, PA
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    808
  • Lastpage
    819
  • Abstract
    Scientific workflow management systems are increasingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs.
  • Keywords
    computational complexity; formal specification; graph theory; workflow management software; NP-hard problem; data products; general workflow specifications; polynomial-time algorithms; scientific workflow management systems; series-parallel graphs; Conference management; Data engineering; Engineering management; Information science; Polynomials; Proteins; Prototypes; Scalability; USA Councils; Workflow management software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.103
  • Filename
    4812456