Title :
Optimized Rollback and Re-computation
Author :
Lakhani, Hasnain ; Tahir, Rashid ; Aqil, Azeem ; Zaffar, Fareed ; Tariq, Dawood ; Gehani, Ashish
Abstract :
Large data processing tasks can be effected using workflow management systems. When either the input data or the programs in the pipeline are modified, the workflow must be re-executed to ensure that the final output data is updated to reflect the changes. Since such re-computation can consume substantial resources, optimizing the system to avoid redundant computation is desirable. In the case of a workflow, the dependency relationships between files are specified at the outset and can be leveraged to track which programs need to be re-executed when particular files change. Current distributed systems cannot provide such functionality when no predefined workflows exist. In this paper, we present an architecture that provides functionality to produce both correct output as well as fast re-execution by leveraging the provenance of data to propagate changes along an implicit dependency graph. We explore the tradeoff between storage and availability by presenting a performance analysis of our rollback and re-execution scheme.
Keywords :
Computational modeling; Context; Data models; History; Performance analysis; Pipelines; Process control; data provenance; error recovery; scientific workflows;
Conference_Titel :
System Sciences (HICSS), 2013 46th Hawaii International Conference on
Conference_Location :
Wailea, HI, USA
Print_ISBN :
978-1-4673-5933-7
Electronic_ISBN :
1530-1605
DOI :
10.1109/HICSS.2013.434