• DocumentCode
    122384
  • Title

    Quasi Fat Trees for HPC Clouds and Their Fault-Resilient Closed-Form Routing

  • Author

    Zahavi, Eitan ; Keslassy, Isaac ; Kolodny, Avinoam

  • Author_Institution
    Mellanox, USA
  • fYear
    2014
  • fDate
    26-28 Aug. 2014
  • Firstpage
    41
  • Lastpage
    48
  • Abstract
    High-Performance Computing (HPC) Clusters and Data Center Networks often rely on fat-tree topologies. However, fat trees and their known variants are not designed for concurrent small jobs. As a result, in recent years, HPC designers have introduced ad-hoc topologies to offer better performance for these concurrent small jobs. In this paper, we present and formally define these topologies, which we call Quasi Fat Trees (QFTs). Specifically, we formulate the graph structure of these new topologies, and show that they perform better for concurrent small jobs. Furthermore, we derive a closed-form and fault-resilient contention-free routing algorithm for all global shift permutations. This routing optimizes the run-time of large computing jobs that utilize MPI collectives. Finally, we verify the algorithm by running its implementation as an OpenSM routing engine on various sizes of QFT topologies, and show that it exhibits good performance.
  • Keywords
    cloud computing; computer centres; parallel processing; software fault tolerance; topology; trees (mathematics); workstation clusters; DCN; HPC clouds; OpenSM routing engine; QFT topology; data center networks; fault-resilient closed-form routing; high-performance computing clusters; quasifat trees; Clustering algorithms; Joining processes; Network topology; Ports (Computers); Routing; Topology; Vegetation; Fat Tree; HPC; Routing; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Performance Interconnects (HOTI), 2014 IEEE 22nd Annual Symposium on
  • Conference_Location
    Mountain View, CA
  • Type

    conf

  • DOI
    10.1109/HOTI.2014.19
  • Filename
    6925717