DocumentCode
3122683
Title
Differencing Provenance in Scientific Workflows
Author
Bao, Zhuowei ; Cohen-Boulakia, Sarah ; Davidson, Susan B. ; Eyal, Anat ; Khanna, Sanjeev
Author_Institution
Dept. of Comput. & Inf. Sci., Univ. of Pennsylvania, Philadelphia, PA
fYear
2009
fDate
March 29 2009-April 2 2009
Firstpage
808
Lastpage
819
Abstract
Scientific workflow management systems are increasingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs.
Keywords
computational complexity; formal specification; graph theory; workflow management software; NP-hard problem; data products; general workflow specifications; polynomial-time algorithms; scientific workflow management systems; series-parallel graphs; Conference management; Data engineering; Engineering management; Information science; Polynomials; Proteins; Prototypes; Scalability; USA Councils; Workflow management software;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location
Shanghai
ISSN
1084-4627
Print_ISBN
978-1-4244-3422-0
Electronic_ISBN
1084-4627
Type
conf
DOI
10.1109/ICDE.2009.103
Filename
4812456
Link To Document