DocumentCode
1159935
Title
Design of algorithm-based fault-tolerant multiprocessor systems for concurrent error detection and fault diagnosis
Author
Vinnakota, Bapiraju ; Jha, Niraj K.
Author_Institution
Dept. of Electr. Eng., Minnesota Univ., Minneapolis, MN, USA
Volume
5
Issue
10
fYear
1994
fDate
10/1/1994 12:00:00 AM
Firstpage
1099
Lastpage
1106
Abstract
Algorithm-based fault tolerance (ABPT) is a low-overhead system-level concurrent error detection and fault location scheme for multiprocessor systems. We present new methods for the design of ABFT systems. Our design procedure is applicable to a wide range of systems in which processors share data elements. A feature of our design approach is that the type of checks to be used in the final system can be controlled by the system designer. We also present some new bounds on the number of checks needed in ABFT system design
Keywords
fault location; fault tolerant computing; multiprocessing systems; parallel architectures; reliability; system recovery; ABFT system design; ABFT systems; algorithm-based fault tolerance; algorithm-based multiprocessor systems; concurrent error detection; data element sharing; design procedure; fault diagnosis; fault location scheme; fault-tolerant multiprocessor systems; low-overhead system-level error detection; Algorithm design and analysis; Control systems; Design methodology; Fault detection; Fault diagnosis; Fault location; Fault tolerance; Fault tolerant systems; Multiprocessing systems; Signal processing algorithms;
fLanguage
English
Journal_Title
Parallel and Distributed Systems, IEEE Transactions on
Publisher
ieee
ISSN
1045-9219
Type
jour
DOI
10.1109/71.313125
Filename
313125
Link To Document