• DocumentCode
    2053980
  • Title

    Locality-Aware Parallel Process Mapping for Multi-core HPC Systems

  • Author

    Hursey, Joshua ; Squyres, Jeffrey M. ; Dontje, Terry

  • Author_Institution
    Oak Ridge Nat. Lab., Oak Ridge, TN, USA
  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    527
  • Lastpage
    531
  • Abstract
    High Performance Computing (HPC) systems are composed of servers containing an ever-increasing number of cores. With such high processor core counts, non-uniform memory access (NUMA) architectures are almost universally used to reduce inter-processor and memory communication bottlenecks by distributing processors and memory throughout a server-internal networking topology. Application studies have shown that the tuning of processes placement in a server´s NUMA networking topology to the application can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale HPC applications. This paper presents the Locality-Aware Mapping Algorithm (LAMA) for distributing the individual processes of a parallel application across processing resources in an HPC system, paying particular attention to the internal server NUMA topologies. The algorithm is able to support both homogeneous and heterogeneous hardware systems, and dynamically adapts to the available hardware and user-specified process layout at run-time. As implemented in Open MPI, the LAMA provides 362,880 mapping permutations and is able to naturally scale out to additional hardware resources as they become available in future architectures.
  • Keywords
    multiprocessing systems; parallel processing; HPC application; NUMA architecture; heterogeneous hardware system; high performance computing; homogeneous hardware system; interprocessor; locality-aware mapping algorithm; locality-aware parallel process mapping; memory communication; multicore HPC system; nonuniform memory access; processor core counts; Hardware; Heuristic algorithms; Instruction sets; Layout; Servers; Sockets; Topology; Locality; MPI; NUMA; Process Affinity; Resource Management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.59
  • Filename
    6061201