DocumentCode
1666454
Title
Machine Learning-Based Configuration Parameter Tuning on Hadoop System
Author
Chi-Ou Chen ; Ye-Qi Zhuo ; Chao-Chun Yeh ; Che-Min Lin ; Shih-Wei Liao
Author_Institution
Comput. Intell. Technol. Center, Ind. Technol. Res. Inst., Hsinchu, Taiwan
fYear
2015
Firstpage
386
Lastpage
392
Abstract
Apache Hadoop system is a software framework with the capability to process large-scale datasets across a cluster of distributed machines using MapReduce programming model. However, there are two main challenges for system administrators to manage the Hadoop system, (1) system administrators are difficult to tune the parameters appropriately since the behaviors and characteristics of large-scale distributed systems are too complicated, (2) there are dozens of configuration parameters affecting the system performance which makes the configuration parameters tuning task becomes troublesome. In this paper, we focus on optimizing the Hadoop MapReduce job performance by tuning configuration parameters, and then we propose an analytical method to help system administrators choose approximately optimal configuration parameters depending on the characteristics of each application. Our approach has two key phases: prediction and optimization phase. The prediction phase is to estimate the performance of a MapReduce job, whereas the optimization phase is to search the approximately optimal configuration parameters strategically by invoking the predictor repeatedly. In our evaluation results, our work can help system administrators to improve the performance about 2X to 8X better than traditional methods.
Keywords
learning (artificial intelligence); parallel processing; Apache Hadoop system; Hadoop MapReduce job performance; MapReduce programming model; approximately optimal configuration parameter; configuration parameters tuning task; distributed machine; large-scale dataset; large-scale distributed system; machine learning-based configuration parameter tuning; software framework; system administrator; system performance; Accuracy; Approximation methods; Optimization; Predictive models; Regression tree analysis; System performance; Tuning; Distributed System; Machine Learning; Optimization Problem;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.64
Filename
7207248
Link To Document