Title :
Fault-tolerant communication with partitioned dimension-order routers
Author :
Boppana, Rajendra V. ; Chalasani, Suresh
Author_Institution :
Div. of Comput. Sci., Texas Univ., San Antonio, TX, USA
fDate :
10/1/1999 12:00:00 AM
Abstract :
The current fault-tolerant routing methods require extensive changes to practical routers such as the Cray T3D´s dimension-order router to handle faults. In this paper, we propose methods to handle faults in multicomputers with dimension-order routers with simple changes to router structure and logic. Our techniques can be applied to current implementations in which the router is partitioned into multiple modules and no centralized crossbar is used. We consider arbitrarily located faulty blocks and assume only local knowledge of faults. We apply our techniques for torus networks and show that, with as few as four virtual channels per physical channel, deadlock- and livelock-free routing can be provided even with multiple faults and multimodule implementation of routers. Our simulations of the proposed technique for 2D tori and mesh indicate that the performance degradation is similar to that seen in the case of cross-bar based designs previously proposed
Keywords :
fault tolerant computing; multiprocessor interconnection networks; performance evaluation; telecommunication network routing; 2D tori; Cray T3D; arbitrarily located faulty blocks; deadlock-free routing; dimension-order router; dimension-order routers; fault-tolerant communication; livelock-free routing; multicomputers; partitioned dimension-order routers; router structure; simulations; torus networks; virtual channels; Communication switching; Degradation; Fault tolerance; Logic; Network topology; Pins; Rivers; Routing; System recovery; Throughput;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on