Title :
Direct MPI Library for Intel Xeon Phi Co-Processors
Author :
Min Si ; Ishikawa, Yozo ; Tatagi, Masamichi
Author_Institution :
Dept. of Comput. Sci., Univ. of Tokyo, Tokyo, Japan
Abstract :
DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between Intel Xeon Phi co-processors without assistance from the host. Since DCFA, a direct communication facility for many-core based accelerators, provides direct Infini-Band communication functionality with the same interface as that on the host processor for Xeon Phi co-processor user space programs, direct InfiniBand communication between Xeon Phi co-processors could easily be developed. Using DCFA, an MPI library able to perform direct inter-node communication between Xeon Phi co-processors, has been designed and implemented. The implementation is based on the Mellanox InfiniBand HCA and the pre-production version of the Intel Xeon Phi coprocessor. DCFA-MPI delivers 3 times greater bandwidth than the ´Intel MPI on Xeon Phi co-processors´ mode, and a from 2 to 12 times speed-up when compared to the ´Intel MPI on Xeon where it offloads computation to Xeon Phi co-processors´ mode in communication with 2 MPI processes. It also shows from 2 to 4 times speed-up over the Intel MPI on Xeon Phi Intel MPI on Xeon where it offloads computation to Xeon Phi co-processors´ mode in a five point stencil computation with an 8 processes * 56 threads parallelization by MPI + OpenMP.
Keywords :
application program interfaces; coprocessors; message passing; multiprocessing systems; open systems; peripheral interfaces; DCFA-MPI; Intel Xeon Phi coprocessor card; MPI processes; Mellanox InfiniBand HCA; OpenMP; PCI Express; data transfer; direct Infiniband communication functionality; direct MPI library; direct communication facility; direct internode communication; many-core-based accelerators; preproduction version; stencil computation; threads parallelization; Computational modeling; Computer architecture; Data transfer; Kernel; Libraries; Protocols; Receivers; InfiniBand; MPI library; Xeon phi; accelerator; co-processor; direct communication;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
DOI :
10.1109/IPDPSW.2013.179