DocumentCode :
2052747
Title :
Automatic Task Re-organization in MapReduce
Author :
Guo, Zhenhua ; Pierce, Marlon ; Fox, Geoffrey ; Zhou, Mo
Author_Institution :
Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
fYear :
2011
fDate :
26-30 Sept. 2011
Firstpage :
335
Lastpage :
343
Abstract :
MapReduce is increasingly considered as a useful parallel programming model for large-scale data processing. It exploits parallelism among execution of primitive map and reduce operations. Hadoop is an open source implementation of MapReduce that has been used in both academic research and industry production. However, its implementation strategy that one map task processes one data block limits the degree of concurrency and degrades performance because of inability to fully utilize available resources. In addition, its assumption that task execution time in each phase does not vary much does not always hold, which makes speculative execution useless. In this paper, we present mechanisms to dynamically split and consolidate tasks to cope with load balancing and break through the concurrency limit resulting from fixed task granularity. For single-job systems, two algorithms are proposed for circumstances where prior knowledge is known and unknown. For multi-job cases, we propose a modified shortest-job-first strategy, which minimizes job turnaround time theoretically when combined with task splitting. We compared the effectiveness of our approach to the default task scheduling strategy using both synthesized and trace-based workloads. Simulation results show that our approach improves performance significantly.
Keywords :
large-scale systems; parallel programming; resource allocation; scheduling; Hadoop; MapReduce; automatic task reorganization; large-scale data processing; load balancing; parallel programming; shortest-job-first strategy; task granularity; task scheduling; task splitting; Clustering algorithms; Concurrent computing; Data models; Educational institutions; Load management; Scheduling; Skeleton; Bag-of-Divisible-Tasks; Load Balancing; MapReduce; Task Splitting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
Type :
conf
DOI :
10.1109/CLUSTER.2011.44
Filename :
6061152
Link To Document :
بازگشت