DocumentCode :
3582734
Title :
Fast NIC based RDMA implementation for adaptive unreliable networks
Author :
Wang Shaogang ; Xu Weixia ; Wu Dan ; Pang Zhengbin
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2014
Firstpage :
302
Lastpage :
309
Abstract :
Remote Direct Memory Access (RDMA) is one of the basic communication fabrics for parallel computers. Its performance is crucial for most parallel workloads. For next generation super computers, the system usually extends to extremely large scale with millions of computation cores connected through the interconnection network. Most interconnection network features out of order packets delivery and low system-wide network reliability. Building high performance and reliable point-to-point communication over next generation interconnection network is a challenging task. Current system usually implements RDMA through two approaches: 1) the receiver side counters approach; 2) sender side sliding window approach. We see that current approach works better on reliable interconnection networks, but has obvious performance degradation over the un-reliable network. This paper proposes a new hardware approach for fast RDMA transmission, which can provide scalable performance on un-reliable network. Our approach uses a novel receiver side sliding window to support out-of-order packets delivery. Our approach proposes a new implementation of the traditional sliding window approach, which uses receiver side sliding window to efficiently retransmit partial RDMA data when part of the packets fail to reach the receiver. The experiments show that for low reliability interconnection network, our approach has scalable performance benefit over current RDMA implementations.
Keywords :
computer network reliability; file organisation; multiprocessor interconnection networks; next generation networks; parallel machines; NIC based RDMA implementation; RDMA transmission; adaptive unreliable network; communication fabric; computation core; high performance communication; low system-wide network reliability; next generation interconnection network; next generation super computer; out-of-order packets delivery; parallel computer; parallel workload; partial RDMA data; performance degradation; receiver side sliding window; reliability interconnection network; reliable point-to-point communication; remote direct memory access; scalable performance benefit; sender side sliding window approach; Computers; Hardware; Indexes; Multiprocessor interconnection; Out of order; Payloads; Receivers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on
Type :
conf
DOI :
10.1109/AICCSA.2014.7073213
Filename :
7073213
Link To Document :
بازگشت