Title :
Exploiting Common Subexpressions for Cloud Query Processing
Author :
Silva, Yasin N. ; Larson, Per-Ake ; Zhou, Jingren
Author_Institution :
Arizona State Univ., Glendale, AZ, USA
Abstract :
Many companies now routinely run massive data analysis jobs -- expressed in some scripting language -- on large clusters of low-end servers. Many analysis scripts are complex and contain common sub expressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common sub expression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To find a truly optimal plan, the optimizer must trade off such conflicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common sub expression. The approach has been prototyped in SCOPE, Microsoft´s system for massive data analysis. Experimental analysis of both simple and large real-world scripts shows that the extended optimizer produces plans with 21 to 57% lower estimated costs.
Keywords :
authoring languages; cloud computing; costing; data analysis; optimisation; query processing; Microsoft system; SCOPE; analysis scripts; cascade-style optimizer; cloud query processing; common subexpressions; conflicting requirements; conventional optimization techniques; cost-based manner; estimated costs; extended optimizer; low-end servers; massive data analysis jobs; physical requirements; real-world scripts; scripting language; sub expression multiple times; Companies; Data analysis; History; Object recognition; Optimization; Query processing; USA Councils;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.106