DocumentCode :
800422
Title :
Automatic reconfiguration and yield of the TESH multicomputer network
Author :
Maziarz, B.M. ; Jain, V.K.
Author_Institution :
SKF Condition Monitoring, San Diego, CA, USA
Volume :
51
Issue :
8
fYear :
2002
fDate :
8/1/2002 12:00:00 AM
Firstpage :
963
Lastpage :
972
Abstract :
This paper considers defect tolerance issues for parallel computing systems based on a new interconnection network, namely "Tori connected mESHes (TESH)". Key features of this network are the following: it is hierarchical, thus allowing exploitation of computation locality and systematic expansion up to a million processors; and it appears to be well-suited for VLSI/ULSI realization, including 3D implementation. The goal here is to present efficient reconfiguration algorithms for such hierarchical parallel computing systems. Despite the dramatic improvement in defect density in recent years, it is still necessary to provide redundancy and defect circumvention to achieve acceptable system-level yields for large multicomputer systems. The TESH-based parallel systems are no exception. Therefore, we develop placement and routing algorithms that assign logical nodes to healthy physical nodes and configure switches to bypass the defective cells, switches and links. Simulations indicate that the placement is nearly 100 percent effective, while the routing performance diminishes with increasing defect density for a given extent of redundancy. The approach scales up well because, in TESH networks, essentially the same kind of sparing is used at all levels.
Keywords :
fault tolerant computing; message passing; multiprocessing systems; multiprocessor interconnection networks; parallel architectures; reconfigurable architectures; redundancy; TESH multicomputer network; ULSI; VLSI; fault-tolerance; hierarchical networks; interconnection networks; parallel computing systems; reconfiguration; redundancy; routing algorithms; Computer aided manufacturing; Computer networks; Concurrent computing; Multiprocessor interconnection networks; Parallel processing; Redundancy; Routing; Switches; Ultra large scale integration; Very large scale integration;
fLanguage :
English
Journal_Title :
Computers, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9340
Type :
jour
DOI :
10.1109/TC.2002.1024742
Filename :
1024742
Link To Document :
بازگشت