Dataset Scaling and MapReduce Performance

Author

Fan Zhang ; Sakr, Majd

Author_Institution

Dept. of Comput. Sci., Carnegie Mellon Univ. in Qatar, Doha, Qatar

fYear

2013

fDate

20-24 May 2013

Firstpage

1683

Lastpage

1690

Abstract

Predicting execution behavior of MapReduce applications when scaling the input dataset presents a challenging problem. The difficulty lies in the distributed locations of input data and the distributed, virtualized compute resources that utilize different network substrates. The potential payoff lies in using small datasets and limited test runs to understand how applications will behave with "big data." Our research has developed an in-depth understanding of MapReduce application performance and analyzed the impact of scaling input datasets. While we might expect that "embarrassingly parallel" MapReduce jobs should scale linearly with input dataset size, our results show that execution time sometimes increases nonlinearly. To verify our predictions, we identify a benchmark set of Map-, Shuffle-, and Reduce-intensive applications. Experimental results show that our execution-time analysis distinguishes four typical application behaviors when scaling input datasets.

Keywords

benchmark testing; parallel processing; software performance evaluation; virtualisation; MapReduce application execution behavior prediction; MapReduce application performance; MapReduce jobs; dataset scaling; distributed data locations; execution-time analysis; map-intensive applications; reduce-intensive applications; shuffle-intensive applications; virtualized compute resources; Analytical models; Benchmark testing; Computational modeling; Mathematical model; Parallel processing; Scalability; TV; Cloud computing; MapReduce applications; dataset size; input scaling; parallel computing;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International

Conference_Location

Cambridge, MA

Print_ISBN

978-0-7695-4979-8

Type

conf

DOI

10.1109/IPDPSW.2013.143

Filename

6651066