Title :
The design of fault tolerant, high-performance control systems
Author_Institution :
Dept. of Electron., York Univ., UK
Abstract :
A discussion is given on the fault tolerance of parallel and distributed control systems. There are a number of additional difficulties when designing fault-tolerance into parallel systems compared with the design of sequential systems. In addition to the problems associated with single processor system design, such as error detection and system recovery, parallel system designs must also consider error confinement, communication faults, distributed placement of fault-tolerant mechanisms and coordination of error detection and system recovery. The complexity of parallel and distributed systems puts considerable emphasis on a system designer if systems are to be resilient to faults. Automated and semi-automated tools and language facilities are required to help with such designs. The paper considers work performed that is designed to deal with some of these problems in an attempt to make parallel and distributed systems both efficient and fault-tolerant-the goal for designing all such systems
Keywords :
computerised control; error detection; fault tolerant computing; parallel processing; system recovery; automated tools; communication faults; complexity; distributed control systems; distributed placement; error confinement; error detection; fault tolerance; fault-tolerant mechanisms; high-performance control systems; parallel systems; system recovery;
Conference_Titel :
High Performance Computing for Advanced Control, IEE Colloquium on
Conference_Location :
London