• DocumentCode
    2322031
  • Title

    Kernel-Assisted MPI Collective Communication among Many-core Clusters

  • Author

    Ma, Teng

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
  • fYear
    2012
  • fDate
    13-16 May 2012
  • Firstpage
    741
  • Lastpage
    745
  • Abstract
    Architectural hierarchies and hardware complexity brought by multicore or many-core Clusters, greatly challenge MPI applications´ performance in two ways: performance efficiency and cross-platform portability. The cross-platform portability assumption, ´write once and efficiently run everywhere,´ is not guaranteed by current MPI libraries, mainly due to implementation details. To partially address the performance issue exposed by hardware complexity and memory hierarchies, we propose a kernel assisted MPI collective communication approach, directly based on the kernel assisted one-sided single-copy module: KNEM. First, we introduce the general operating principles of KNEM memory copy, and then we present the design and implementation of KNEM collective, an intra-node collective component of Open MPI for shared memory nodes. Additionally, we describe how to integrate the kernel-assisted approach into collective communications on heterogeneous multicore clusters with intra- and inter-node communication. We evaluate and experimentally demonstrate the performance advantages of our kernel-assisted MPI collective over state-of-art MPI libraries (Open MPI and MVAPICH2).
  • Keywords
    application program interfaces; computational complexity; message passing; operating system kernels; performance evaluation; shared memory systems; KNEM; KNEM memory copy; MPI libraries; MVAPICH2; Open MPI; architectural hierarchies; cross-platform portability; general operating principles; hardware complexity; heterogeneous multicore clusters; intranode collective component; kernel assisted one-sided single-copy module; kernel-assisted MPI collective communication; many-core clusters; memory hierarchies; performance efficiency; performance evaluation; shared memory nodes; Bandwidth; Complexity theory; Hardware; Kernel; Libraries; Multicore processing; Receivers; HPC; MPI; cluster; collective communication; kernel; multicore;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
  • Conference_Location
    Ottawa, ON
  • Print_ISBN
    978-1-4673-1395-7
  • Type

    conf

  • DOI
    10.1109/CCGrid.2012.38
  • Filename
    6217504