Title :
The fault tolerant parallel processor operating system concepts and performance measurement overview
Author :
Babikyan, Carol A.
Author_Institution :
Charles Stark Draper Lab. Inc., Cambridge, MA, USA
Abstract :
It is pointed out that mission critical applications of the future will require a computing system capable of high throughput as well as very high reliability. The fault tolerant parallel processor (FTPP), a system designed specifically to satisfy these goals, is described. The FTPP architecture consists of interconnection network/redundancy management hardware and standard commercial processors. The architecture provides flexibility in the appropriate balance of throughput and reliability for a given application. Furthermore, to maintain a system of high reliability the FTPP expeditiously identifies faulty components and performs some remedial operations. These redundancy management functions are performed by the operating system to relive the application from the knowledge of the underlying fault tolerance. How the operating system achieves redundancy management in conjunction with the fault tolerant hardware is described. Performance data to characterize system behavior are presented. Performance measurements indicate that the cost of fault tolerance does not significantly penalize forming redundancy management functions requires a mere .93 ms/frame more than a simplex processor performing no redundancy management
Keywords :
fault tolerant computing; parallel architectures; performance evaluation; redundancy; reliability; Byzantine resilience; cluster architecture; cost; digital avionics; fault tolerant parallel processor; flexibility; interconnection network/redundancy management; message handling; mission critical applications; mission critical system; parallel architecture; performance measurement; redundancy management; reliability; synchronisation; Computer architecture; Computer network management; Fault tolerant systems; Hardware; Maintenance; Mission critical systems; Multiprocessor interconnection networks; Operating systems; Redundancy; Throughput;
Conference_Titel :
Digital Avionics Systems Conference, 1990. Proceedings., IEEE/AIAA/NASA 9th
Conference_Location :
Virginia Beach, VA
DOI :
10.1109/DASC.1990.111316