DocumentCode :
2789543
Title :
High Performance MPI on IBM 12x InfiniBand Architecture
Author :
Vishnu, Abhinav ; Benton, Brad ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH
fYear :
2007
fDate :
26-30 March 2007
Firstpage :
1
Lastpage :
8
Abstract :
InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. I/O interfaces like PCI-express and GX+ are being introduced as next generation technologies to drive InfiniBand with very high throughput. HCAs with throughput of 8x on PCI-express have become available. Recently, support for HCAs with 12x throughput on GX+ has been announced. In this paper, we design a message passing interface (MPI) on IBM 12x dual-port HCAs, which consist of multiple send/recv engines per port. We propose and study the impact of various communication scheduling policies (binding, striping and round robin). Based on this study, we present a new policy, EPC (enhanced point-to-point and collective), which incorporates different kinds of communication patterns; point-to-point (blocking, non-blocking) and collective communication, for data transfer. We implement our design and evaluate it with micro-benchmarks, collective communication and NAS parallel benchmarks. Using EPC on a 12x InfiniBand cluster with one HCA and one port, we can improve the performance by 41% with pingpong latency test and 63-65% with the unidirectional and bi-directional bandwidth tests, compared with the default single-rail MPI implementation. Our evaluation on NAS parallel benchmarks shows an improvement of 7-13% in execution time for integer sort and Fourier transform.
Keywords :
Fourier transforms; application program interfaces; computer architecture; message passing; peripheral interfaces; Fourier transform; HCA; IBM 12x InfiniBand architecture; PCI-express; application program interface; cluster computing; communication scheduling policy; data transfer; high performance MPI; message passing; peripheral interface; Bandwidth; Benchmark testing; Bidirectional control; Computer architecture; Delay; Engines; High performance computing; Message passing; Round robin; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
Conference_Location :
Long Beach, CA
Print_ISBN :
1-4244-0910-1
Electronic_ISBN :
1-4244-0910-1
Type :
conf
DOI :
10.1109/IPDPS.2007.370407
Filename :
4228135
Link To Document :
بازگشت