Title :
X-CSR: Dataflow Optimization for Distributed XML Process Pipelines
Author :
Zinn, Daniel ; Bowers, Shawn ; McPhillips, Timothy ; Ludascher, Bertram
Author_Institution :
Dept. of Comput. Sci., UC Davis, Davis, CA
fDate :
March 29 2009-April 2 2009
Abstract :
XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called Delta-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas") of XML data collections while keeping the overall collection structure intact. We show how to optimize the execution of Delta-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped") to downstream pipeline steps. Finally, we present evaluation results for a real- world scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as.
Keywords :
XML; pipeline processing; Delta-XML; X-CSR; coarse-grained dataflow applications; data processors; data-centric scientific workflows; dataflow optimization; distributed XML process pipelines; Corporate acquisitions; Cost function; Data engineering; Design optimization; Distributed processing; Marine vehicles; Pipelines; Process design; Production; XML; XML; actors; data intensive; dataflow; pipeline; scientific workflow; shipping optimization; streaming;
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
DOI :
10.1109/ICDE.2009.72