• DocumentCode
    3200683
  • Title

    Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters

  • Author

    Dazhao Cheng ; Jia Rao ; Changjun Jiang ; Xiaobo Zhou

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Colorado, Colorado Springs, CO, USA
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    956
  • Lastpage
    965
  • Abstract
    As Hadoop is becoming increasingly popular in large-scale data analysis, there is a growing need for providing predictable services to users who have strict requirements on job completion times. While earliest deadline first scheduling (EDF) like algorithms are popular in guaranteeing job deadlines in real-time systems, they are not effective in a dynamic Hadoop environment, i.e., a Hadoop cluster with dynamically available resources. As there is a growing number of Hadoop clusters deployed on hybrid systems, e.g., infrastructure powered by mix of traditional and renewable energy, and cloud platforms hosting heterogeneous workloads, variable resource availability becomes common when running Hadoop jobs. In this paper, we propose, RDS, a Resource and Deadline-aware Hadoop job Scheduler that takes future resource availability into consideration when minimizing job deadline misses. We formulate the job scheduling problem as an online optimization problem and solve it using an efficient receding horizon control algorithm. To aid the control, we design a self-learning model to estimate job completion times and use a simple but effective model to predict future resource availability. We have implemented RDS in the open source Hadoop implementation and performed evaluations with various benchmark workloads. Experimental results show that RDS substantially reduces the penalty of deadline misses by at least 36% and 10% compared with Fair Scheduler and EDF scheduler, respectively.
  • Keywords
    cloud computing; data analysis; parallel processing; processor scheduling; real-time systems; unsupervised learning; EDF scheduler; RDS; cloud platform; deadline-aware job scheduling; dynamic Hadoop cluster; dynamic Hadoop environment; earliest deadline first scheduling; fair scheduler; future resource availability; heterogeneous workload; job completion time; job deadline misses; job scheduling problem; large-scale data analysis; online optimization problem; open source Hadoop implementation; real-time system; receding horizon control algorithm; renewable energy; resource and deadline-aware Hadoop job scheduler; self-learning model; traditional energy; variable resource availability; Dynamic scheduling; Heuristic algorithms; Job shop scheduling; Optimization; Predictive models; Resource management; Deadline-aware; Dynamic Hadoop Clusters; Job Scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
  • Conference_Location
    Hyderabad
  • ISSN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2015.36
  • Filename
    7161581