Direct MPI Library for Intel Xeon Phi Co-Processors

Author

Min Si ; Ishikawa, Yozo ; Tatagi, Masamichi

Author_Institution

Dept. of Comput. Sci., Univ. of Tokyo, Tokyo, Japan

fYear

2013

fDate

20-24 May 2013

Firstpage

816

Lastpage

824

Abstract

DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between Intel Xeon Phi co-processors without assistance from the host. Since DCFA, a direct communication facility for many-core based accelerators, provides direct Infini-Band communication functionality with the same interface as that on the host processor for Xeon Phi co-processor user space programs, direct InfiniBand communication between Xeon Phi co-processors could easily be developed. Using DCFA, an MPI library able to perform direct inter-node communication between Xeon Phi co-processors, has been designed and implemented. The implementation is based on the Mellanox InfiniBand HCA and the pre-production version of the Intel Xeon Phi coprocessor. DCFA-MPI delivers 3 times greater bandwidth than the ´Intel MPI on Xeon Phi co-processors´ mode, and a from 2 to 12 times speed-up when compared to the ´Intel MPI on Xeon where it offloads computation to Xeon Phi co-processors´ mode in communication with 2 MPI processes. It also shows from 2 to 4 times speed-up over the Intel MPI on Xeon Phi Intel MPI on Xeon where it offloads computation to Xeon Phi co-processors´ mode in a five point stencil computation with an 8 processes * 56 threads parallelization by MPI + OpenMP.

Keywords

application program interfaces; coprocessors; message passing; multiprocessing systems; open systems; peripheral interfaces; DCFA-MPI; Intel Xeon Phi coprocessor card; MPI processes; Mellanox InfiniBand HCA; OpenMP; PCI Express; data transfer; direct Infiniband communication functionality; direct MPI library; direct communication facility; direct internode communication; many-core-based accelerators; preproduction version; stencil computation; threads parallelization; Computational modeling; Computer architecture; Data transfer; Kernel; Libraries; Protocols; Receivers; InfiniBand; MPI library; Xeon phi; accelerator; co-processor; direct communication;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International

Conference_Location

Cambridge, MA

Print_ISBN

978-0-7695-4979-8

Type

conf

DOI

10.1109/IPDPSW.2013.179

Filename

6650960