DocumentCode :
2364755
Title :
ART: robustness of meshes and tori for parallel and distributed computation
Author :
Yeh, Chi-Hsiang ; Parhami, Behrooz
Author_Institution :
Dept. of Electr. & Comput. Eng., Queen´´s Univ., Canada
fYear :
2002
fDate :
2002
Firstpage :
463
Lastpage :
472
Abstract :
We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n1-1d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X´Y´ routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.
Keywords :
fault tolerant computing; matrix multiplication; multiprocessor interconnection networks; parallel processing; sorting; telecommunication network routing; FFT; RAIL; array robustness theorems; distributed computation; fault-free arrays; faulty arrays; matrix operations; meshes; middleware; parallel computation; permutation; random faults; robust adaptation interface layer; slowdown factor; sorting; tori; total exchange; Art; Concurrent computing; Distributed computing; Hardware; Libraries; Rails; Redundancy; Robustness; Sorting; Subspace constraints;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 2002. Proceedings. International Conference on
ISSN :
0190-3918
Print_ISBN :
0-7695-1677-7
Type :
conf
DOI :
10.1109/ICPP.2002.1040903
Filename :
1040903
Link To Document :
بازگشت