DocumentCode
2456918
Title
Extending Map-Reduce for Efficient Predicate-Based Sampling
Author
Grover, Raman ; Carey, Michael J.
Author_Institution
Dept. of Comput. Sci., Univ. of California, Irvine, CA, USA
fYear
2012
fDate
1-5 April 2012
Firstpage
486
Lastpage
497
Abstract
In this paper we address the problem of using MapReduce to sample a massive data set in order to produce a fixed-size sample whose contents satisfy a given predicate. While it is simple to express this computation using MapReduce, its default Hadoop execution is dependent on the input size and is wasteful of cluster resources. This is unfortunate, as sampling queries are fairly common (e.g., for exploratory data analysis at Facebook), and the resulting waste can significantly impact the performance of a shared cluster. To address such use cases, we present the design, implementation and evaluation of a Hadoop execution model extension that supports incremental job expansion. Under this model, a job consumes input as required and can dynamically govern its resource consumption while producing the required results. The proposed mechanism is able to support a variety of policies regarding job growth rates as they relate to cluster capacity and current load. We have implemented the mechanism in Hadoop, and we present results from an experimental performance study of different job growth policies under both single- and multi-user workloads.
Keywords
data handling; Hadoop execution model extension; MapReduce; cluster capacity; cluster resource; fixed-size sample; incremental job expansion; job growth policy; massive data sampling; multiuser workload; predicate-based sampling; resource consumption; single-user workload; Availability; Delay; Facebook; Indexes; Load modeling; Runtime; Time factors;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location
Washington, DC
ISSN
1063-6382
Print_ISBN
978-1-4673-0042-1
Type
conf
DOI
10.1109/ICDE.2012.104
Filename
6228108
Link To Document