• DocumentCode
    3244834
  • Title

    An Evaluation of Communication Factors on an Adaptive Control Strategy for Job Co-allocation in Multiple HPC Clusters

  • Author

    Qin, Jinhui ; Bauer, Michael A.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Western Ontario, London, ON, Canada
  • fYear
    2009
  • fDate
    8-11 Dec. 2009
  • Firstpage
    391
  • Lastpage
    398
  • Abstract
    To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters, i.e., job co-allocation, offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processors on any single cluster. Effective co-allocation, ultimately, depends on the inter-cluster communication cost. We previously introduced a scalable co-allocation strategy - maximum bandwidth adjacent cluster set (MBAS) strategy. It made use of two thresholds to control job co-allocation - one dealing with inter-cluster links and one controlling job partitioning. We subsequently introduced the adaptive threshold control system (ATCS), which used a fuzzy control approach to dynamically adjust these thresholds within MBAS. Results suggested that using ATCS during MBAS job co-allocation could achieve an overall performance improvement. However, these results only considered jobs that involved either master-slave or all-all communications among constituent processes. In this paper, we extend this analysis by also considering jobs that exhibit 2D-mesh communication patterns and evaluate ATCS further.
  • Keywords
    adaptive control; fuzzy control; telecommunication control; workstation clusters; 2D-mesh communication pattern; adaptive threshold control system; communication factor; fuzzy control; high performance computing; intercluster link; job coallocation; job partitioning; maximum bandwidth adjacent cluster set; multiple HPC cluster; scalable coallocation strategy; Adaptive control; Adaptive systems; Bandwidth; Communication system control; Computer networks; Control systems; Costs; High performance computing; Programmable control; Resource management; adaptive control; fuzzy control; high-performance computing clusters; job co-allocation; resource management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2009 15th International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1521-9097
  • Print_ISBN
    978-1-4244-5788-5
  • Type

    conf

  • DOI
    10.1109/ICPADS.2009.36
  • Filename
    5395301