Title :
Balancing job performance with system performance via locality-aware scheduling on torus-connected systems
Author :
Xu Yang ; Zhou Zhou ; Wei Tang ; Xingwu Zheng ; Jia Wang ; Zhiling Lan
Author_Institution :
Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
Abstract :
Torus-connected network is widely used in modern supercomputers due to its linear per node cost scaling and its competitive overall performance. Job scheduling system plays a critical role for the efficient use of supercomputers. As supercomputers continue growing in size, a fundamental problem arises: how to effectively balance job performance with system performance on torus-connected machines? In this work, we will present a new scheduling design named window-based locality-aware scheduling. Our design contains three novel features. First, rather than one-by-one job scheduling, our design takes a “window” of jobs, i.e. multiple jobs, into consideration for job prioritizing and resource allocation. Second, our design maintains a list of slots to preserve node contiguity information for resource allocation. Finally, we formulate our scheduling decision making into a 0-1 Multiple Knapsack Problem and present two algorithms to solve the problem. A series of trace-based simulations using job logs collected from production supercomputers indicate that this new scheduling design has real potentials and can effectively balance job performance and system performance.
Keywords :
knapsack problems; mobile computing; parallel machines; performance evaluation; processor scheduling; 0-1 multiple knapsack problem; competitive overall performance; job performance; job prioritizing; job scheduling system; node contiguity information; resource allocation; scheduling decision making; scheduling design; supercomputers; system performance; torus-connected machines; torus-connected network; trace-based simulations; window-based locality-aware scheduling; Approximation algorithms; Greedy algorithms; Processor scheduling; Resource management; Scheduling; Supercomputers; System performance;
Conference_Titel :
Cluster Computing (CLUSTER), 2014 IEEE International Conference on
Conference_Location :
Madrid
DOI :
10.1109/CLUSTER.2014.6968751