Title :
Virtual-to-Physical address translation for an FPGA-based interconnect with host and GPU remote DMA capabilities
Author :
Ammendola, Roberto ; Biagioni, Andrea ; Frezza, Ottorino ; Cicero, Francesca Lo ; Lonardo, Alessandro ; Paolucci, Pier Stanislao ; Rossetti, Davide ; Simula, Francesco ; Tosoratto, Laura ; Vicini, Piero
Author_Institution :
Sez. Roma “Tor Vergata”, INFN, Rome, Italy
Abstract :
We developed a custom FPGA-based Network Interface Controller named APEnet+ aimed at GPU accelerated clusters for High Performance Computing. The card exploits peer-to-peer capabilities (GPU-Direct RDMA) for latest NVIDIA GPGPU devices and the RDMA paradigm to perform fast direct communication between computing nodes, offloading the host CPU from network tasks execution. In this work we focus on the implementation of a Virtual to Physical address translation mechanism, using the FPGA embedded soft-processor. Address management is the most demanding task - we estimated up to 70% of the μC load - for the NIC receiving side, resulting being the main culprit for data bottleneck. To improve the performance of this task and hence improve data transfer over the network, we added a specialized hardware logic block acting as a Translation Lookaside Buffer. This block makes use of a peculiar Content Address Memory implementation designed for scalability and speed. We present detailed measurements to demonstrate the benefits coming from the introduction of such custom logic: a substantial address translation latency reduction (from a measured value of 1.9 μs to 124 ns) and a performance enhancement of both host-bound and GPU-bound data transfers (up to ~ 60% of bandwidth increase) in given message size ranges.
Keywords :
access protocols; content-addressable storage; field programmable gate arrays; file organisation; graphics processing units; interconnections; logic design; network interfaces; APEnet+; FPGA embedded soft-processor; FPGA-based interconnect; FPGA-based network interface controller; GPU remote DMA capability; GPU-bound data transfers; GPU-direct RDMA; NVIDIA GPGPU devices; address management; computing nodes; content address memory; fast direct communication; hardware logic block; high performance computing; host CPU; network task execution; peer-to-peer capability; remote direct memory access protocol; substantial address translation latency reduction; time 1.9 mus to 124 ns; translation lookaside buffer; virtual-to-physical address translation mechanism; Computer aided manufacturing; Field programmable gate arrays; Graphics processing units; Hardware; Peer-to-peer computing; Protocols; Random access memory;
Conference_Titel :
Field-Programmable Technology (FPT), 2013 International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4799-2199-7
DOI :
10.1109/FPT.2013.6718331