Title :
The Failure-rate Aware Scheduling Policies for Large-scale Cluster Systems
Author :
Wu, Linping ; Chao Ren ; Dan Meng ; Zhan Jianfeng ; Bibo Tu
Author_Institution :
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing
Abstract :
With the scale expanding, node failures become one of the important obstacles when using large-scale cluster systems. The traditional scheduling policies of cluster only took into account the factors such as jobs priority and node load with the node failure rate omitted. The function of job scheduling in cluster system can be divided into two sub-processes: job selection process and node allocation process. In this paper, we introduce several scheduling policies considering the node failure rate with which the more dependable nodes are selected during the node allocation process. In the end, we use the discrete event-driven simulation method to evaluate the policies and the simulation results show that the failure-rate aware scheduling policies do better than random node allocation policy for the system performance
Keywords :
discrete event simulation; scheduling; workstation clusters; discrete event-driven simulation; failure-rate aware scheduling; job scheduling; job selection; large-scale cluster systems; node allocation; Discrete event simulation; Exponential distribution; Large-scale systems; Processor scheduling; Random variables; Research and development; Shape; Supercomputers; Weather forecasting; Weibull distribution;
Conference_Titel :
Parallel and Distributed Computing, Applications and Technologies, 2006. PDCAT '06. Seventh International Conference on
Conference_Location :
Taipei
Print_ISBN :
0-7695-2736-1
DOI :
10.1109/PDCAT.2006.109