Title :
Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds
Author :
Young, Jeffrey ; Yalamanchili, Sudhakar
Author_Institution :
Dept. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
Hardware support for Global Address Spaces (GAS) has previously focused on providing efficient access across remote memories, typically using custom interconnects or high-level software layers. New technologies, such as Extoll, HyperShare, and NumaConnect now allow for cheaper ways to build GAS support into the data center, thus making high-performance coherent and non-coherent remote memory access available for standard data center applications. At the same time, data center designers are currently experimenting with a greater use of accelerators like GPUs to enhance traditionally CPU-oriented processes, such as data warehousing queries for in-core databases. However, there are very few workable approaches for these accelerator clusters that both use commodity interconnects and also support simple multi-node programming models, such as GAS. We propose a new commodity-based approach for supporting non-coherent GAS in accelerator clouds using the HyperTransport Consortium´s HyperTransport over Ethernet (HToE) specification. This work details a system model for using HToE for accelerated data warehousing applications and investigates potential bottlenecks and design optimizations for an HToE network adapter, or HyperTransport Ethernet Adapter (HTEA). Using a detailed network simulator model and timing measured for queries run on high-end GPUs, we find that the addition of wider deencapsulation pipelines and the use of bulk acknowledgments in the HTEA can improve overall throughput and reduce latency for multiple senders using a common accelerator. Furthermore, we show that the bandwidth of one receiving HTEA can vary from 2.8 Gbps to 24.45 Gbps, depending on the optimizations used, and the inter-HTEA latency for one packet is 1,480 ns. A brief analysis of the path from remote memory to accelerators also demonstrates that the bandwidth of today´s GPUs can easily handle a stream-based computation model using HToE.
Keywords :
cloud computing; computer centres; data warehouses; graphics processing units; local area networks; multiprocessing systems; optimisation; parallel programming; pattern clustering; query processing; storage management; CPU-oriented processes; GAS; HToE specification; HyperTransport Ethernet adapter; accelerator clouds; accelerator clusters; commodity converged fabrics; commodity-based approach; custom interconnects; data warehousing applications; deencapsulation pipelines; design optimizations; global address spaces; high-end GPU; high-level software layers; high-performance coherent remote memory access; high-performance noncoherent remote memory access; hyper-transport consortium HyperTransport over Ethernet; interHTEA latency; network simulator model; remote memories; remote memory; standard data center applications; stream-based computation model; system model; Data models; Fabrics; Graphics processing unit; Hardware; Peer to peer computing; Protocols; Standards; Databases; Ethernet; GAS; GPGPU; Networks;
Conference_Titel :
High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on
Conference_Location :
Liverpool
Print_ISBN :
978-1-4673-2164-8
DOI :
10.1109/HPCC.2012.48