• DocumentCode
    3298737
  • Title

    Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments

  • Author

    Zhuoyao Zhang ; Cherkasova, Ludmila ; Loo, Boon Thau

  • Author_Institution
    Univ. of Pennsylvania, Philadelphia, PA, USA
  • fYear
    2013
  • fDate
    June 28 2013-July 3 2013
  • Firstpage
    839
  • Lastpage
    846
  • Abstract
    Many companies start using Hadoop for advanced data analytics over large datasets. While a traditional Hadoop cluster deployment assumes a homogeneous cluster, many enterprise clusters are grown incrementally over time, and might have a variety of different servers in the cluster. The nodes´ heterogeneity represents an additional challenge for efficient cluster and job management. Due to resource heterogeneity, it is often unclear which resources introduce inefficiency and bottlenecks, and how such a Hadoop cluster should be configured and optimized. In this work1, we explore the efficiency and performance accuracy of the bounds-based performance model for predicting the MapReduce job completion times in heterogeneous Hadoop clusters. We validate the accuracy of the proposed performance model using a diverse set of 13 realistic applications and two different heterogeneous clusters. Since one of the Hadoop clusters is formed by different capacity VM instances in Amazon EC2 environment, we additionally explore and discuss factors that impact the MapReduce job performance in the Cloud.
  • Keywords
    cloud computing; data analysis; Amazon EC2 environment; Hadoop cluster deployment; MapReduce job completion times; advanced data analytics; bounds-based performance model; heterogeneous cloud environments; performance modeling; Accuracy; Computational modeling; Electronic publishing; Encyclopedias; Predictive models; Upper bound; MapReduce; efficiency; heterogeneous clusters; performance modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5028-2
  • Type

    conf

  • DOI
    10.1109/CLOUD.2013.107
  • Filename
    6740232