DocumentCode :
1791545
Title :
A cross-job framework for MapReduce scheduling
Author :
Xuejie Xiao ; Jian Tang ; Zhenhua Chen ; Jielong Xu ; Chonggang Wang
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Syracuse Univ., Syracuse, NY, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
135
Lastpage :
140
Abstract :
In this paper, we present a novel cross-job framework for MapReduce scheduling, which aims to minimize the total processing time of a sequence of related jobs by combining reduce and map phases of two consecutive jobs and streaming data between them. The proposed framework has the following desirable properties: (1) It can accelerate the execution of a sequence of related MapReduce jobs by achieving a good tradeoff between data locality and parallelism. (2) It can support all the existing MapReduce applications with no changes to their source code. (3) It is a general framework, which can work with different scheduling algorithms. We built a new MapReduce runtime system called cross-job Hadoop by integrating the proposed cross-job framework into Hadoop. We conducted extensive experiments to evaluate its performance using PageRank and an Apache Pig application. Our experimental results show that the cross-job Hadoop can significantly reduce both the total processing time of a job sequence and the size of data transferred over the network.
Keywords :
data handling; distributed processing; scheduling; Apache Pig application; MapReduce runtime system; MapReduce scheduling; PageRank; cross-job Hadoop; cross-job framework; data locality; data parallelism; job sequence; streaming data; Approximation algorithms; Estimation; Processor scheduling; Program processors; Schedules; Scheduling; Big Data; MapReduce; Resource Management; Task Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004222
Filename :
7004222
Link To Document :
بازگشت