DocumentCode :
2082457
Title :
Estimating the progress of MapReduce pipelines
Author :
Morton, Kristi ; Friesen, Abram ; Balazinska, Magdalena ; Grossman, Dan
Author_Institution :
Comput. Sci. & Eng. Dept., Univ. of Washington, Seattle, WA, USA
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
681
Lastpage :
684
Abstract :
In parallel query-processing environments, accurate, time-oriented progress indicators could provide much utility given that inter- and intra-query execution times can have high variance. However, none of the techniques used by existing tools or available in the literature provide non-trivial progress estimation for parallel queries. In this paper, we introduce Parallax, the first such indicator. While several parallel data processing systems exist, the work in this paper targets environments where queries consist of a series of MapReduce jobs. Parallax builds on recently-developed techniques for estimating the progress of single-site SQL queries, but focuses on the challenges related to parallelism and variable execution speeds. We have implemented our estimator in the Pig system and demonstrate its performance through experiments with the PigMix benchmark and other queries running in a real, small-scale cluster.
Keywords :
SQL; parallel programming; query processing; MapReduce pipelines; Parallax indicator; Pig system; inter-query execution times; intra-query execution times; nontrivial progress estimation; parallel query-processing environments; single-site SQL queries; Computer science; Data analysis; Data processing; Database systems; Feedback; Open source software; Parallel processing; Pipelines; Query processing; Turning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447919
Filename :
5447919
Link To Document :
بازگشت