• DocumentCode
    652247
  • Title

    A Comparative Study of Job Scheduling Strategies in Large-Scale Parallel Computational Systems

  • Author

    Chandio, Aftab Ahmed ; Cheng-Zhong Xu ; Tziritas, Nikos ; Bilal, Kashif ; Khan, Samee U.

  • Author_Institution
    Shenzhen Inst. of Adv. Technol., Shenzhen, China
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    949
  • Lastpage
    957
  • Abstract
    With the advent of High Performance Computing (HPC) in the large-scale parallel computational environment, job scheduling and resource allocation techniques are required to deliver the Quality of Service (QoS) and resource management. Therefore, job scheduling on a large-scale parallel system has been studied to: (a) minimize the queue time and response time, and (b) maximize the overall system utilization. We compare and analyze thirteen job scheduling policies to analyze their behavior. The set of job scheduling policies include: (a) priority-based policies, (b) first fit, (c) backfilling techniques, and (d) window-based policies. All of the policies are extensively simulated and compared. A real data center workload comprised of 22385 jobs is used for simulation. We analyze the: (a) queue time, (b) response time, and (c) slowdown ratio to evaluate the policies. Moreover, we present a comprehensive workload characterization that can be used as a tool for optimizing system´s performance and for scheduler design. We investigate four categories of the workload characteristics including: (a) Narrow, (b) Wide, (c) Short, and (d) Long for detailed analysis of the schedulers´ performance. This study highlights the strengths and weakness of various job scheduling polices and helps to choose an appropriate job scheduling policy in a given scenario.
  • Keywords
    computer centres; parallel processing; performance evaluation; quality of service; resource allocation; scheduling; HPC; QoS; Quality of Service; comparative study; data center; high performance computing; job scheduling; job scheduling strategies; large scale parallel computational systems; parallel computational environment; queue time; resource allocation techniques; resource management; Dynamic scheduling; Optimal scheduling; Processor scheduling; Quality of service; Resource management; System performance; Data center; Job Scheduling; Large-scale Parallel Computational Systems; Workload Characterization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.116
  • Filename
    6680936