Title :
Fault-tolerant processor arrays using space and time redundancy
Author :
Kwai, Ding-Ming ; Parhami, Behrooz
Author_Institution :
Dept. of Electr. & Comput. Eng., California Univ., Santa Barbara, CA, USA
Abstract :
Spare processors in a processor array are usually idle in normal operation. They are used only after a fault is detected through periodic or on-line diagnosis and the processor array is reconfigured to include them. In this paper we propose a design methodology in which the spare processors are used to aid with a data-driven error detection scheme. Our method consists of attaching tags to data streams, thereby allowing the data items to carry their own control information. A checking processor changes the tags when it detects a disagreement among replicated computation results. The faulty processor can then be located by error information derived from two distinct data streams. We incorporate the techniques using space and time redundancy into a fault-tolerant processor array that can provide different levels of fault tolerance according to the availability of fault-free processors. The scheme is also flexible in that it can trade error detection capability for added computational throughput
Keywords :
error detection; fault tolerant computing; parallel processing; data-driven error detection scheme; design methodology; error information; fault-tolerant processor arrays; faulty processor; online diagnosis; periodic diagnosis; processor array; space redundancy; spare processors; time redundancy; Design methodology; Electrical fault detection; Encoding; Error correction; Fault detection; Fault diagnosis; Fault tolerance; Joining processes; Redundancy; Throughput;
Conference_Titel :
Algorithms & Architectures for Parallel Processing, 1996. ICAPP 96. 1996 IEEE Second International Conference on
Print_ISBN :
0-7803-3529-5
DOI :
10.1109/ICAPP.1996.562889