Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities

Author

Ammendola, Roberto ; Biagioni, Andrea ; Frezza, Ottorino ; Cicero, Francesca Lo ; Lonardo, Alessandro ; Paolucci, Pier Stanislao ; Rossetti, Davide ; Simula, Francesco ; Tosoratto, Laura ; Vicini, Piero

Author_Institution

Sez. Roma “Tor Vergata”, INFN, Rome, Italy

fYear

2013

fDate

9-11 Dec. 2013

Firstpage

58

Lastpage

65

Abstract

We developed a custom FPGA-based Network Interface Controller named APEnet+ aimed at GPU accelerated clusters for High Performance Computing. The card exploits peer-to-peer capabilities (GPU-Direct RDMA) for latest NVIDIA GPGPU devices and the RDMA paradigm to perform fast direct communication between computing nodes, offloading the host CPU from network tasks execution. In this work we focus on the implementation of a Virtual to Physical address translation mechanism, using the FPGA embedded soft-processor. Address management is the most demanding task - we estimated up to 70% of the μC load - for the NIC receiving side, resulting being the main culprit for data bottleneck. To improve the performance of this task and hence improve data transfer over the network, we added a specialized hardware logic block acting as a Translation Lookaside Buffer. This block makes use of a peculiar Content Address Memory implementation designed for scalability and speed. We present detailed measurements to demonstrate the benefits coming from the introduction of such custom logic: a substantial address translation latency reduction (from a measured value of 1.9 μs to 124 ns) and a performance enhancement of both host-bound and GPU-bound data transfers (up to ~ 60% of bandwidth increase) in given message size ranges.

Keywords

access protocols; content-addressable storage; field programmable gate arrays; file organisation; graphics processing units; interconnections; logic design; network interfaces; APEnet+; FPGA embedded soft-processor; FPGA-based interconnect; FPGA-based network interface controller; GPU remote DMA capability; GPU-bound data transfers; GPU-direct RDMA; NVIDIA GPGPU devices; address management; computing nodes; content address memory; fast direct communication; hardware logic block; high performance computing; host CPU; network task execution; peer-to-peer capability; remote direct memory access protocol; substantial address translation latency reduction; time 1.9 mus to 124 ns; translation lookaside buffer; virtual-to-physical address translation mechanism; Computer aided manufacturing; Field programmable gate arrays; Graphics processing units; Hardware; Peer-to-peer computing; Protocols; Random access memory;

fLanguage

English

Publisher

ieee

Conference_Titel

Field-Programmable Technology (FPT), 2013 International Conference on

Conference_Location

Kyoto

Print_ISBN

978-1-4799-2199-7

Type

conf

DOI

10.1109/FPT.2013.6718331

Filename

6718331