DocumentCode
652247
Title
A Comparative Study of Job Scheduling Strategies in Large-Scale Parallel Computational Systems
Author
Chandio, Aftab Ahmed ; Cheng-Zhong Xu ; Tziritas, Nikos ; Bilal, Kashif ; Khan, Samee U.
Author_Institution
Shenzhen Inst. of Adv. Technol., Shenzhen, China
fYear
2013
fDate
16-18 July 2013
Firstpage
949
Lastpage
957
Abstract
With the advent of High Performance Computing (HPC) in the large-scale parallel computational environment, job scheduling and resource allocation techniques are required to deliver the Quality of Service (QoS) and resource management. Therefore, job scheduling on a large-scale parallel system has been studied to: (a) minimize the queue time and response time, and (b) maximize the overall system utilization. We compare and analyze thirteen job scheduling policies to analyze their behavior. The set of job scheduling policies include: (a) priority-based policies, (b) first fit, (c) backfilling techniques, and (d) window-based policies. All of the policies are extensively simulated and compared. A real data center workload comprised of 22385 jobs is used for simulation. We analyze the: (a) queue time, (b) response time, and (c) slowdown ratio to evaluate the policies. Moreover, we present a comprehensive workload characterization that can be used as a tool for optimizing system´s performance and for scheduler design. We investigate four categories of the workload characteristics including: (a) Narrow, (b) Wide, (c) Short, and (d) Long for detailed analysis of the schedulers´ performance. This study highlights the strengths and weakness of various job scheduling polices and helps to choose an appropriate job scheduling policy in a given scenario.
Keywords
computer centres; parallel processing; performance evaluation; quality of service; resource allocation; scheduling; HPC; QoS; Quality of Service; comparative study; data center; high performance computing; job scheduling; job scheduling strategies; large scale parallel computational systems; parallel computational environment; queue time; resource allocation techniques; resource management; Dynamic scheduling; Optimal scheduling; Processor scheduling; Quality of service; Resource management; System performance; Data center; Job Scheduling; Large-scale Parallel Computational Systems; Workload Characterization;
fLanguage
English
Publisher
ieee
Conference_Titel
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location
Melbourne, VIC
Type
conf
DOI
10.1109/TrustCom.2013.116
Filename
6680936
Link To Document