Title :
The robust-algorithm approach to fault tolerance on processor arrays: fault models, fault diameter, and basic algorithms
Author :
Parhami, Behrooz ; Yeh, Chi-Hsiang
Author_Institution :
Dept. of Electr. & Comput. Eng., California Univ., Santa Barbara, CA, USA
fDate :
30 Mar-3 Apr 1998
Abstract :
With few exceptions, the two issues of algorithm design and fault tolerance for processor arrays have been dealt with separately, in that algorithm developers have assumed the availability of complete fault-free arrays and fault tolerance techniques have aimed at restoring such complete arrays by reconfiguring faulty ones. We present the design of robust algorithms that run efficiently on complete arrays but are also tolerant of fault processors/links in a degraded mode. This is a complementary approach in that our algorithms can be used on reconfigurable arrays that tolerate a certain number of faults while maintaining their regularity, with the graceful degradation feature kicking in once the fault tolerance limit of the reconfiguration scheme is exceeded. The fault models considered in this paper comprise of the faulty processors/links being removed from the pool of resources (removal model) or bypassed in their respective rows/columns (bypass model). We discuss the two models, derive tight upper bounds for the fault diameter of the resulting networks, and present building-block algorithms for semigroup computation, parallel prefix computation, data rearrangement, matrix multiplication and sorting
Keywords :
fault tolerant computing; matrix multiplication; multiprocessing systems; multiprocessor interconnection networks; parallel algorithms; parallel architectures; reconfigurable architectures; sorting; adaptive parallel algorithm; algorithm design; bypass model; data rearrangement; degradation feature; fault diameter; fault models; fault tolerance; fault-free arrays; matrix multiplication; parallel prefix computation; processor arrays; reconfigurable arrays; removal model; robust algorithm approach; semigroup computation; sorting; upper bounds; Algorithm design and analysis; Concurrent computing; Degradation; Error correction; Fault detection; Fault tolerance; Hypercubes; Parallel algorithms; Robustness; Routing;
Conference_Titel :
Parallel Processing Symposium, 1998. IPPS/SPDP 1998. Proceedings of the First Merged International ... and Symposium on Parallel and Distributed Processing 1998
Conference_Location :
Orlando, FL
Print_ISBN :
0-8186-8404-6
DOI :
10.1109/IPPS.1998.670010