• DocumentCode
    3199677
  • Title

    Improving Batch Scheduling on Blue Gene/Q by Relaxing 5D Torus Network Allocation Constraints

  • Author

    Zhou Zhou ; Xu Yang ; Zhiling Lan ; Rich, Paul ; Wei Tang ; Morozov, Vitali ; Desai, Narayan

  • Author_Institution
    Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    439
  • Lastpage
    448
  • Abstract
    As systems scale toward exactable, many resources will become increasingly constrained. While some of these resources have historically been explicitly allocated, many -- such as network bandwidth, I/O bandwidth, or power -- have not. As systems continue to evolve, we expect many such resources to become explicitly managed. This change will pose critical challenges to resource management and job scheduling. In this paper, we explore the potentiality of relaxing network allocation constraints for Blue Gene systems. Our objectives to improve the batch scheduling performance, where the partition-based interconnect architecture provides a unique opportunity to explicitly allocate network resources to jobs. This paper makes three major contributions. The first is substantial benchmarking of parallel applications, focusing on assessing application sensitivity to communication bandwidth at large scale. The second is two new scheduling schemes using relaxed network allocation and targeted at balancing individual job performance with overall system performance. The third is a comparative study of our scheduling schemes versus the existing one under different workloads, using job traces collected from the 48-rack Mira, an IBM Blue Gene/Q system at Argonne National Laboratory.
  • Keywords
    parallel processing; scheduling; 5D torus network allocation constraints; Argonne National Laboratory; IBM Blue Gene/Q system; batch scheduling; job scheduling; parallel applications; resource management; Bandwidth; Benchmark testing; Network topology; Resource management; Runtime; Scheduling; Wiring;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
  • Conference_Location
    Hyderabad
  • ISSN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2015.110
  • Filename
    7161532