Title :
MC2: Map Concurrency Characterization for MapReduce on the Cloud
Author :
Hammoud, Mohammad ; Sakr, Majd F.
Author_Institution :
Carnegie Mellon Univ. in Qatar, Doha, Qatar
fDate :
June 28 2013-July 3 2013
Abstract :
MapReduce is now a pervasive analytics engine on the cloud. Hadoop is an open source implementation of MapReduce and is currently enjoying wide popularity. Hadoop offers a high-dimensional space of configuration parameters, which makes it difficult for practitioners to set for efficient and cost-effective execution. In this work we observe that MapReduce application performance is highly influenced by map concurrency. Map concurrency is defined in terms of two configurable parameters, the number of available map slots and the number of map tasks running over the slots. We show that some inherent MapReduce characteristics enable well-informed prediction of map concurrency. We propose Map Concurrency Characterization (MC2), a standalone utility program that can predict the best map concurrency for any given MapReduce application. By leveraging the generated predicted information, MC2 can judiciously guide Map phase configuration and, consequently, improve Hadoop performance. Unlike many of relevant schemes, MC2 does not employ simulation, dynamic instrumentation, and/or static analysis of unmodified job code to predict map concurrency. In contrast, MC2 utilizes a simple, yet effective mathematical model, which exploits the MapReduce characteristics that impact map concurrency. We implemented MC2 and conducted comprehensive experiments on a private cloud and on Amazon MC2 using Hadoop 0.20.2. Our results show that MC2 can correctly predict the best map concurrencies for the tested benchmarks and provide up to 2.2X speedup in runtime.
Keywords :
cloud computing; concurrency control; data analysis; public domain software; Amazon EC2; Hadoop 0.20.2; Hadoop performance; MC2 standalone utility program; Map phase configuration; MapReduce application; configurable parameters; configuration parameters; cost-effective execution; high-dimensional space; map concurrency characterization; map slots; map tasks; open source implementation; pervasive analytics engine; predicted information; private cloud; static analysis; unmodified job code; Benchmark testing; Concurrent computing; Engines; Equations; Mathematical model; Runtime; Time factors; Hadoop; Map Concurrency; Map Concurrency Characterization; MapReduce;
Conference_Titel :
Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5028-2
DOI :
10.1109/CLOUD.2013.93