DocumentCode :
3191606
Title :
Predator — An experience guided configuration optimizer for Hadoop MapReduce
Author :
Kewen Wang ; Xuelian Lin ; Wenzhong Tang
Author_Institution :
Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
fYear :
2012
fDate :
3-6 Dec. 2012
Firstpage :
419
Lastpage :
426
Abstract :
MapReduce is a distributed computing programming framework which provides an effective solution to the data processing challenge. As an open-source implementation of MapReduce, Hadoop has been widely used in practice. The performance of Hadoop MapReduce heavily depends on its configuration settings, so tuning these configuration parameters could be an effective way to improve its performance. However, picking out the optimal configuration settings is not easy for the time consuming nature of MapReduce together with the high dimensional and nonlinear features of its configuration optimization. In this paper, we introduce Predator, an experience guided configuration optimizer, which does not treat the optimization problem as a pure black-box problem but utilizes useful experience learnt from Hadoop MapReduce configuration practice to assist the optimizing process. The optimizer uses job execution time estimated by a practical MapReduce cost model as the objective function, and classifies Hadoop MapReduce parameters into different groups by their different tunable levels to shrink search space. Furthermore, the optimization algorithm of the optimizer uses the idea of subspace division to prevent local optimum problem, and it could also reduce the searching time by cutting down the cost in visiting unpromising points in search space. Experiments on Hadoop clusters demonstrate the effectiveness and efficiency of the optimizer.
Keywords :
distributed processing; Hadoop MapReduce; MapReduce cost model; Predator; configuration optimization; configuration settings; data processing; distributed computing programming framework; experience guided configuration optimizer; job execution time; local optimum problem; subspace division; tunable levels; Algorithm design and analysis; Cloud computing; Conferences; Linear programming; Optimization; Search problems; Tuning; Configuration; Hadoop; MapReduce; Optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-4511-8
Electronic_ISBN :
978-1-4673-4509-5
Type :
conf
DOI :
10.1109/CloudCom.2012.6427486
Filename :
6427486
Link To Document :
بازگشت