DocumentCode
2266292
Title
Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments
Author
Zhang, Xiaohong ; Zhong, Zhiyong ; Feng, Shengzhong ; Tu, Bibo ; Fan, Jianping
Author_Institution
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
fYear
2011
fDate
26-28 May 2011
Firstpage
120
Lastpage
126
Abstract
Data Locality is one of the critical factors to affect performance. This paper proposes a next-k-node scheduling (NKS) method to improve the data locality of map tasks. The method first calculates the probabilities of each map task, and then preferentially schedules the one with the highest probability. It generates low probabilities for the tasks which satisfy node locality with the nodes to issue requests, so it can reserve these tasks to these nodes. We have implemented the NKS method in hadoop-0.20.2. The experiment results have shown that the NKS method reduced 78% of the map tasks processed without node locality, reduced 77%of the network load caused by the tasks, and improved the performance of Hadoop MapReduce when comparing with the default task scheduling method in Hadoop. Obviously, the NKS method is very suitable for the homogeneous environment with network overload.
Keywords
data analysis; parallel processing; probability; scheduling; task analysis; Hadoop MapReduce; NKS method; data locality; hadoop-0.20.2; homogeneous computing; map tasks; next-k-node scheduling; probability; Data models; Distributed databases; Probability; Radio access networks; Schedules; Scheduling; Topology; MapReduce; cloud computing; data locality; distributed computing; network load; task scheduling;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing with Applications (ISPA), 2011 IEEE 9th International Symposium on
Conference_Location
Busan
Print_ISBN
978-1-4577-0391-1
Electronic_ISBN
978-0-7695-4428-1
Type
conf
DOI
10.1109/ISPA.2011.14
Filename
5951893
Link To Document