DocumentCode :
3434962
Title :
Cluster-Size Scaling and MapReduce Execution Times
Author :
Fan Zhang ; Sakr, Majd
Author_Institution :
Massachusetts Inst. of Technol., Albany, MA, USA
Volume :
1
fYear :
2013
fDate :
2-5 Dec. 2013
Firstpage :
240
Lastpage :
249
Abstract :
Understanding performance scalability in MapReduce applications presents a challenging problem. The difficulty lies in the distributed locations of input data and the distributed compute resources that utilize varied network substrates. User-defined Map and Reduce stages, with numerous application parameters, further complicate the problem. Using small datasets and limited test runs to understand how MapReduce applications will behave with "big data" can have a significant payoff. In this paper, we evaluate the impact of cluster-size scaling on execution time for a set of Map- and Reduce-intensive applications. We model the MapReduce framework, specify conditions and implications of power-law conformity, and verify our model with data from benchmark MapReduce applications. Empirical results indicate that: (1) within a range of scaling parameters, MapReduce execution times follow a power-law distribution. (2) Power-law scalability for Map-intensive applications starts from a small cluster size. (3) Shuffle-intensive applications exhibit power-law behavior starting from larger clusters. (4) Cluster-scaling performance gains fail to show power-law behavior when computing resources far exceed those needed. Our findings will facilitate using small-scale test runs to allocate and configure virtual and physical computing resources in large scale clouds.
Keywords :
cloud computing; parallel programming; power aware computing; virtual machines; Big data; MapReduce execution time; benchmark MapReduce applications; cluster-size scaling; distributed compute resources; distributed locations; large scale clouds; model verification; performance scalability; physical computing resources; power-law conformity; power-law distribution; power-law scalability; virtual computing resources; Analytical models; Bandwidth; Benchmark testing; Computational modeling; Lead; Mathematical model; Scalability; Cluster scaling; Large-scale over-provisioning; MapReduce applications; Power-law distribution; Small-scale limitation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on
Conference_Location :
Bristol
Type :
conf
DOI :
10.1109/CloudCom.2013.39
Filename :
6753804
Link To Document :
بازگشت