DocumentCode :
2991881
Title :
Statistics-based Workload Modeling for MapReduce
Author :
Yang, Hailong ; Luan, Zhongzhi ; Li, Wenjun ; Qian, Depei ; Guan, Gang
Author_Institution :
Sch. of Comput. Sci., Beihang Univ., Beijing, China
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
2043
Lastpage :
2051
Abstract :
Large-scale data-intensive computing with MapReduce framework in Cloud is becoming pervasive for the core business of many academic, government, and industrial organizations. Hadoop is by far the most successful realization of MapReduce framework. While MapReduce is easy-to-use, efficient and reliable for data-intensive computations, the excessive configuration parameters in Hadoop cause unexpected challenges when running various workloads with Hadoop cluster effectively. Consequently, developers who have less experience with the Hadoop configuration system may devote a significant effort to write an application with poor performance, because they have no idea how these configurations would influence the performance, or they are not even aware that these configurations exist. In this paper, we propose a statistic analysis approach to identify the relationships among workload characteristics, Hadoop configurations and workload performance. Several non-intuitive relationships between workload characteristics and relative performance are revealed and the experimental results demonstrate that our regression models accurately predict the performance of MapReduce workloads under different Hadoop configurations.
Keywords :
cloud computing; configuration management; data handling; regression analysis; ubiquitous computing; Hadoop cluster; Hadoop configuration system; Hadoop configurations; MapReduce framework; cloud computing; configuration parameters; data-intensive computation; large-scale data-intensive computing; nonintuitive relationship; performance prediction; regression model; statistic analysis approach; statistics-based workload modeling; workload characteristics; workload performance; Benchmark testing; Computational modeling; Correlation; Measurement; Parallel processing; Principal component analysis; Tuning; MapReduce; analytical model; data intensive computing; performance prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
Type :
conf
DOI :
10.1109/IPDPSW.2012.254
Filename :
6270413
Link To Document :
بازگشت