DocumentCode :
827041
Title :
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates
Author :
Tsafrir, Dan ; Etsion, Yoav ; Feitelson, Dror G.
Author_Institution :
Sch. of Comput. Sci. & Eng., Hebrew Univ., Jerusalem
Volume :
18
Issue :
6
fYear :
2007
fDate :
6/1/2007 12:00:00 AM
Firstpage :
789
Lastpage :
803
Abstract :
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). However, predictions have not been incorporated into production schedulers, partially due to a misconception (that we resolve) claiming inaccuracy actually improves performance, but mainly because underprediction is technically unacceptable: users will not tolerate jobs being killed just because system predictions were too short. We solve this problem by divorcing kill-time from the runtime prediction and correcting predictions adaptively as needed if they are proved wrong. The end result is a surprisingly simple scheduler, which requires minimal deviations from current practices (e.g., using FCFS as the basis) and behaves exactly like EASY as far as users are concerned; nevertheless, it achieves significant improvements in performance, predictability, and accuracy. Notably, this is based on a very simple runtime predictor that just averages the runtimes of the last two jobs by the same user; counter intuitively, our results indicate that using recent data is more important than mining the history for similar jobs. All the techniques suggested in this paper can be used to enhance any backfilling algorithm and are not limited to EASY
Keywords :
parallel machines; processor scheduling; EASY scheduler; backfilling algorithm; first come first serve order; parallel job scheduling algorithm; supercomputers; system-generated prediction; user runtime estimates; Accuracy; Delay effects; Dynamic scheduling; History; Job production systems; Measurement; Processor scheduling; Runtime; Scheduling algorithm; Supercomputers; EASY; EASY++; Parallel job scheduling; SJBF.; backfilling; dynamic prediction correction; history-based predictions; performance metrics; runtime estimates; system-generated predictions;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2007.70606
Filename :
4180346
Link To Document :
بازگشت