• DocumentCode
    1998986
  • Title

    Dataset Scaling and MapReduce Performance

  • Author

    Fan Zhang ; Sakr, Majd

  • Author_Institution
    Dept. of Comput. Sci., Carnegie Mellon Univ. in Qatar, Doha, Qatar
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1683
  • Lastpage
    1690
  • Abstract
    Predicting execution behavior of MapReduce applications when scaling the input dataset presents a challenging problem. The difficulty lies in the distributed locations of input data and the distributed, virtualized compute resources that utilize different network substrates. The potential payoff lies in using small datasets and limited test runs to understand how applications will behave with "big data." Our research has developed an in-depth understanding of MapReduce application performance and analyzed the impact of scaling input datasets. While we might expect that "embarrassingly parallel" MapReduce jobs should scale linearly with input dataset size, our results show that execution time sometimes increases nonlinearly. To verify our predictions, we identify a benchmark set of Map-, Shuffle-, and Reduce-intensive applications. Experimental results show that our execution-time analysis distinguishes four typical application behaviors when scaling input datasets.
  • Keywords
    benchmark testing; parallel processing; software performance evaluation; virtualisation; MapReduce application execution behavior prediction; MapReduce application performance; MapReduce jobs; dataset scaling; distributed data locations; execution-time analysis; map-intensive applications; reduce-intensive applications; shuffle-intensive applications; virtualized compute resources; Analytical models; Benchmark testing; Computational modeling; Mathematical model; Parallel processing; Scalability; TV; Cloud computing; MapReduce applications; dataset size; input scaling; parallel computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-0-7695-4979-8
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2013.143
  • Filename
    6651066