Title :
Improve Parallelism of Task Execution to Optimize Utilization of MapReduce Cluster Resources
Author :
Liming Zheng ; Yao Shen
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
MapReduce, as a programming model, has become an important solution to large-scale data-intensive processing. It has been widely used in various fields such as Web search, machine learning and e-commerce. Hadoop, as an open-source implementation of MapReduce, is widely used for offline massive data job. It consists of MapReduce and HDFS. In the study of Hadoop, we found data parallel in Hadoop is coarse grained, and it cannot take full advantage of multi-core system. Eventually, this would lower utilization and efficiency of the whole cluster. To improve Hadoop into a fine grained data-parallel frame, we propose a strategy that scales the parallelism of task execution in map/reduce task. We implement our strategy as a new feature for Hadoop. And our experiments show that strategy can not only optimize utilization of MapReduce cluster resources, but also speedup job completion time up to 3x.
Keywords :
data handling; multiprocessing programs; parallel processing; public domain software; task analysis; Hadoop; MapReduce cluster resources; data-parallel frame; large-scale data-intensive processing; multicore system; open-source implementation; programming model; task execution; Computational modeling; Distributed databases; Fault tolerance; Fault tolerant systems; Instruction sets; Parallel processing; Scalability; MapReduce; Multi-core; Parallelism; Resources utilization; Subtask;
Conference_Titel :
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-7980-6
DOI :
10.1109/CSE.2014.144