Title :
Fault-Tolerant Deadlock-Free Adaptive Routing for Any Set of Link and Node Failures in Multi-cores Systems
Author :
Chaix, Fabien ; Avresky, Dimiter ; Zergainoh, Nacer-Eddine ; Nicolaidis, Michael
Author_Institution :
TIMA, Grenoble, France
Abstract :
Future applications will require processors with many cores communicating through a regular interconnection network. Meanwhile, as the Deep submicron technology fore- shadows highly defective chips era, fault-tolerant designs become compulsory. In particular, the fault tolerance of a core interconnect is critical, and inevitably increases its complexity. In this paper, we present a novel adaptive routing algorithm that is able to route messages in the presence of any set of multiple nodes and links failures, as long as a path exists. Compared to the existing solutions, the proposed algorithm provides fault tolerance without using any routing table. It is scalable and can be applied to multicore chips with a 2D mesh core interconnect of any size. The algorithm is deadlock-free and avoids infinite looping in fault-free and faulty 2D meshes, based on Virtual Networks and Virtual Channels. We simulated the proposed algorithm using the worst case scenario, regarding the traffic patterns and the failure rate up to 40%. Experimentation results confirmed that the algorithm tolerates multiple failures even in the most extreme failure patterns. Additionally, we monitored the trade off between the fault tolerance and the average latency for faulty cases, as measurement of the performance degradation. The algorithm detects the interconnects partitioning and enables "preferred paths" for streaming applications.
Keywords :
fault tolerance; microprocessor chips; multiprocessing systems; 2D mesh core; deadlock-free adaptive routing; fault tolerance; link failure; multicore chips; node failure; traffic pattern; virtual channel; virtual network; Buffer storage; Fault tolerance; Fault tolerant systems; Multicore processing; Partitioning algorithms; Routing; System recovery; Fault-tolerant; adaptative routing; multi-cores chip; network-on-chip; virtual channel; virtual network;
Conference_Titel :
Network Computing and Applications (NCA), 2010 9th IEEE International Symposium on
Conference_Location :
Cambridge, MA
Print_ISBN :
978-1-4244-7628-2
DOI :
10.1109/NCA.2010.14