DocumentCode
122384
Title
Quasi Fat Trees for HPC Clouds and Their Fault-Resilient Closed-Form Routing
Author
Zahavi, Eitan ; Keslassy, Isaac ; Kolodny, Avinoam
Author_Institution
Mellanox, USA
fYear
2014
fDate
26-28 Aug. 2014
Firstpage
41
Lastpage
48
Abstract
High-Performance Computing (HPC) Clusters and Data Center Networks often rely on fat-tree topologies. However, fat trees and their known variants are not designed for concurrent small jobs. As a result, in recent years, HPC designers have introduced ad-hoc topologies to offer better performance for these concurrent small jobs. In this paper, we present and formally define these topologies, which we call Quasi Fat Trees (QFTs). Specifically, we formulate the graph structure of these new topologies, and show that they perform better for concurrent small jobs. Furthermore, we derive a closed-form and fault-resilient contention-free routing algorithm for all global shift permutations. This routing optimizes the run-time of large computing jobs that utilize MPI collectives. Finally, we verify the algorithm by running its implementation as an OpenSM routing engine on various sizes of QFT topologies, and show that it exhibits good performance.
Keywords
cloud computing; computer centres; parallel processing; software fault tolerance; topology; trees (mathematics); workstation clusters; DCN; HPC clouds; OpenSM routing engine; QFT topology; data center networks; fault-resilient closed-form routing; high-performance computing clusters; quasifat trees; Clustering algorithms; Joining processes; Network topology; Ports (Computers); Routing; Topology; Vegetation; Fat Tree; HPC; Routing; Topology;
fLanguage
English
Publisher
ieee
Conference_Titel
High-Performance Interconnects (HOTI), 2014 IEEE 22nd Annual Symposium on
Conference_Location
Mountain View, CA
Type
conf
DOI
10.1109/HOTI.2014.19
Filename
6925717
Link To Document