DocumentCode
2194261
Title
BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing
Author
Jin, Jiahui ; Luo, Junzhou ; Song, Aibo ; Dong, Fang ; Xiong, Runqun
Author_Institution
Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
fYear
2011
fDate
23-26 May 2011
Firstpage
295
Lastpage
304
Abstract
Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.
Keywords
cloud computing; computational complexity; file servers; resource allocation; BAR; Dryad system; Hadoop system; MapReduce system; balance-reduce algorithm; cloud computing; computational complexity; data locality driven task scheduling algorithm; files servers; job completion time; large scale data processing; scarce resource network bandwidth; task allocation; Cloud computing; Clustering algorithms; Heuristic algorithms; Processor scheduling; Resource management; Scheduling; Servers; Cloud Computing; Data Locality; Dryad; Hadoop; Task Scheduling;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on
Conference_Location
Newport Beach, CA
Print_ISBN
978-1-4577-0129-0
Electronic_ISBN
978-0-7695-4395-6
Type
conf
DOI
10.1109/CCGrid.2011.55
Filename
5948620
Link To Document