Kernel-Assisted MPI Collective Communication among Many-core Clusters

Author

Ma, Teng

Author_Institution

Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA

fYear

2012

fDate

13-16 May 2012

Firstpage

741

Lastpage

745

Abstract

Architectural hierarchies and hardware complexity brought by multicore or many-core Clusters, greatly challenge MPI applications´ performance in two ways: performance efficiency and cross-platform portability. The cross-platform portability assumption, ´write once and efficiently run everywhere,´ is not guaranteed by current MPI libraries, mainly due to implementation details. To partially address the performance issue exposed by hardware complexity and memory hierarchies, we propose a kernel assisted MPI collective communication approach, directly based on the kernel assisted one-sided single-copy module: KNEM. First, we introduce the general operating principles of KNEM memory copy, and then we present the design and implementation of KNEM collective, an intra-node collective component of Open MPI for shared memory nodes. Additionally, we describe how to integrate the kernel-assisted approach into collective communications on heterogeneous multicore clusters with intra- and inter-node communication. We evaluate and experimentally demonstrate the performance advantages of our kernel-assisted MPI collective over state-of-art MPI libraries (Open MPI and MVAPICH2).

Keywords

application program interfaces; computational complexity; message passing; operating system kernels; performance evaluation; shared memory systems; KNEM; KNEM memory copy; MPI libraries; MVAPICH2; Open MPI; architectural hierarchies; cross-platform portability; general operating principles; hardware complexity; heterogeneous multicore clusters; intranode collective component; kernel assisted one-sided single-copy module; kernel-assisted MPI collective communication; many-core clusters; memory hierarchies; performance efficiency; performance evaluation; shared memory nodes; Bandwidth; Complexity theory; Hardware; Kernel; Libraries; Multicore processing; Receivers; HPC; MPI; cluster; collective communication; kernel; multicore;

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on

Conference_Location

Ottawa, ON

Print_ISBN

978-1-4673-1395-7

Type

conf

DOI

10.1109/CCGrid.2012.38

Filename

6217504