Title :
Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments
Author :
Zhuoyao Zhang ; Cherkasova, Ludmila ; Loo, Boon Thau
Author_Institution :
Univ. of Pennsylvania, Philadelphia, PA, USA
fDate :
June 28 2013-July 3 2013
Abstract :
Many companies start using Hadoop for advanced data analytics over large datasets. While a traditional Hadoop cluster deployment assumes a homogeneous cluster, many enterprise clusters are grown incrementally over time, and might have a variety of different servers in the cluster. The nodes´ heterogeneity represents an additional challenge for efficient cluster and job management. Due to resource heterogeneity, it is often unclear which resources introduce inefficiency and bottlenecks, and how such a Hadoop cluster should be configured and optimized. In this work1, we explore the efficiency and performance accuracy of the bounds-based performance model for predicting the MapReduce job completion times in heterogeneous Hadoop clusters. We validate the accuracy of the proposed performance model using a diverse set of 13 realistic applications and two different heterogeneous clusters. Since one of the Hadoop clusters is formed by different capacity VM instances in Amazon EC2 environment, we additionally explore and discuss factors that impact the MapReduce job performance in the Cloud.
Keywords :
cloud computing; data analysis; Amazon EC2 environment; Hadoop cluster deployment; MapReduce job completion times; advanced data analytics; bounds-based performance model; heterogeneous cloud environments; performance modeling; Accuracy; Computational modeling; Electronic publishing; Encyclopedias; Predictive models; Upper bound; MapReduce; efficiency; heterogeneous clusters; performance modeling;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.107