• DocumentCode
    2194261
  • Title

    BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

  • Author

    Jin, Jiahui ; Luo, Junzhou ; Song, Aibo ; Dong, Fang ; Xiong, Runqun

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Southeast Univ., Nanjing, China
  • fYear
    2011
  • fDate
    23-26 May 2011
  • Firstpage
    295
  • Lastpage
    304
  • Abstract
    Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.
  • Keywords
    cloud computing; computational complexity; file servers; resource allocation; BAR; Dryad system; Hadoop system; MapReduce system; balance-reduce algorithm; cloud computing; computational complexity; data locality driven task scheduling algorithm; files servers; job completion time; large scale data processing; scarce resource network bandwidth; task allocation; Cloud computing; Clustering algorithms; Heuristic algorithms; Processor scheduling; Resource management; Scheduling; Servers; Cloud Computing; Data Locality; Dryad; Hadoop; Task Scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on
  • Conference_Location
    Newport Beach, CA
  • Print_ISBN
    978-1-4577-0129-0
  • Electronic_ISBN
    978-0-7695-4395-6
  • Type

    conf

  • DOI
    10.1109/CCGrid.2011.55
  • Filename
    5948620