Title :
Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori
Author :
Mejia, A. ; Flich, J. ; Duato, J. ; Reinemo, Sven-Arne ; Skeie, Tor
Author_Institution :
Dpto de Informatica de Sistemas y Computadores, Univ. Politecnica de Valencia
Abstract :
Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computers. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as segment-based routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Evaluation results show that SR increases performance by a factor of 1.8 over FX and up*/down* routing
Keywords :
fault tolerant computing; multiprocessor interconnection networks; telecommunication network routing; telecommunication network topology; deterministic routing; fault-tolerant routing; interconnection networks; meshes; segment-based routing; tori; Clustering algorithms; Concurrent computing; Delay; Ethernet networks; Fault tolerance; Multiprocessor interconnection networks; Routing; Strontium; Throughput; Topology;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location :
Rhodes Island
Print_ISBN :
1-4244-0054-6
DOI :
10.1109/IPDPS.2006.1639341