• DocumentCode
    108121
  • Title

    Guarantee Strict Fairness and UtilizePrediction Better in Parallel Job Scheduling

  • Author

    Yulai Yuan ; Yongwei Wu ; Weimin Zheng ; Keqin Li

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    25
  • Issue
    4
  • fYear
    2014
  • fDate
    Apr-14
  • Firstpage
    971
  • Lastpage
    981
  • Abstract
    As the most widely used parallel job scheduling strategy, EASY backfilling achieved great success, not only because it can balance fairness and performance, but also because it is universally applicable to most HPC systems. However, unfairness still exists in EASY. Our simulation shows that a blocked job can be delayed by later jobs for more than 90 hours on real workloads. Additionally, directly employing runtime prediction techniques in EASY would lead to a serious situation called reservation violation. In this paper, we aim at guaranteeing strict fairness (no job is delayed by any jobs of lower priority) while achieving attractive performance, and employing prediction without causing reservation violation in parallel job scheduling. We propose two novel strategies, namely, shadow load preemption (SLP) and venture backfilling (VB), which are integrated into EASY to construct preemptive venture EASY backfilling (PV-EASY). Experimental results on three real HPC workloads demonstrate that PV-EASY is more attractive than EASY in parallel job scheduling, from both academic and industry perspectives.
  • Keywords
    parallel processing; scheduling; EASY backfilling; HPC systems; SLP strategy; VB strategy; high performance computing systems; parallel job scheduling strategy; reservation violation; runtime prediction techniques; shadow load preemption strategy; strict fairness guarantee; venture backfilling strategy; Delays; Job shop scheduling; Processor scheduling; Program processors; Runtime; Checkpoints; modeling and prediction; parallel system; scheduling; virtualization;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.88
  • Filename
    6487497