DocumentCode
610422
Title
Peeking into the optimization of data flow programs with MapReduce-style UDFs
Author
Hueske, F. ; Peters, Martin ; Krettek, A. ; Ringwald, M. ; Tzoumas, K. ; Markl, V. ; Freytag, J.
Author_Institution
Tech. Univ. Berlin, Berlin, Germany
fYear
2013
fDate
8-12 April 2013
Firstpage
1292
Lastpage
1295
Abstract
Data flows are a popular abstraction to define dataintensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude. We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties. We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and nonrelational data flow programs which highlight the salient features of our approach.
Keywords
data flow computing; optimisation; relational databases; user interfaces; MapReduce; UDF; data flow programs; data processing systems; data-intensive processing; imperative language; optimization; relational DBMS; user-defined functions; Data mining; Data processing; Data visualization; Optimization; Programming; Query processing; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2013 IEEE 29th International Conference on
Conference_Location
Brisbane, QLD
ISSN
1063-6382
Print_ISBN
978-1-4673-4909-3
Electronic_ISBN
1063-6382
Type
conf
DOI
10.1109/ICDE.2013.6544927
Filename
6544927
Link To Document