• DocumentCode
    166653
  • Title

    Balancing job performance with system performance via locality-aware scheduling on torus-connected systems

  • Author

    Xu Yang ; Zhou Zhou ; Wei Tang ; Xingwu Zheng ; Jia Wang ; Zhiling Lan

  • Author_Institution
    Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
  • fYear
    2014
  • fDate
    22-26 Sept. 2014
  • Firstpage
    140
  • Lastpage
    148
  • Abstract
    Torus-connected network is widely used in modern supercomputers due to its linear per node cost scaling and its competitive overall performance. Job scheduling system plays a critical role for the efficient use of supercomputers. As supercomputers continue growing in size, a fundamental problem arises: how to effectively balance job performance with system performance on torus-connected machines? In this work, we will present a new scheduling design named window-based locality-aware scheduling. Our design contains three novel features. First, rather than one-by-one job scheduling, our design takes a “window” of jobs, i.e. multiple jobs, into consideration for job prioritizing and resource allocation. Second, our design maintains a list of slots to preserve node contiguity information for resource allocation. Finally, we formulate our scheduling decision making into a 0-1 Multiple Knapsack Problem and present two algorithms to solve the problem. A series of trace-based simulations using job logs collected from production supercomputers indicate that this new scheduling design has real potentials and can effectively balance job performance and system performance.
  • Keywords
    knapsack problems; mobile computing; parallel machines; performance evaluation; processor scheduling; 0-1 multiple knapsack problem; competitive overall performance; job performance; job prioritizing; job scheduling system; node contiguity information; resource allocation; scheduling decision making; scheduling design; supercomputers; system performance; torus-connected machines; torus-connected network; trace-based simulations; window-based locality-aware scheduling; Approximation algorithms; Greedy algorithms; Processor scheduling; Resource management; Scheduling; Supercomputers; System performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2014 IEEE International Conference on
  • Conference_Location
    Madrid
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2014.6968751
  • Filename
    6968751