Title :
Recovery in fault-tolerant distributed microcontrollers
Author :
Rennels, David A. ; Hwang, Riki
Author_Institution :
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
Abstract :
The paper describes the use of fault tolerance in a microcontroller node to be used in a network of embedded processors. It is primarily motivated by long-life space applications where radiation-induced transient errors will be a frequent occurrence, and a few chip failures may be expected before a mission is completed. A testbed has been constructed, and a real time executive has been developed and tested in it. Preliminary fault-insertion testing has been started. Due to interconnection constraints for latchup circumvention and other reasons, we have chosen a design that is not Byzantine resilient. Even though inconsistent signaling may occur occasionally, multiple recovery actions must converge to a successful testing and restart of the system to regain correct functionality.
Keywords :
fault tolerant computing; microcontrollers; multiprocessing systems; multiprocessor interconnection networks; real-time systems; supervisory programs; system recovery; Byzantine resilient; chip failures; embedded processors; fault-tolerant distributed microcontrollers; inconsistent signaling; interconnection constraints; latchup circumvention; long-life space applications; microcontroller node; multiple recovery actions; preliminary fault-insertion testing; radiation-induced transient errors; real time executive; testbed; Application software; Built-in self-test; Computer errors; Fault tolerance; Hardware; Master-slave; Microcontrollers; Space missions; Space vehicles; Testing;
Conference_Titel :
Dependable Systems and Networks, 2001. DSN 2001. International Conference on
Conference_Location :
Goteborg, Sweden
Print_ISBN :
0-7695-1101-5
DOI :
10.1109/DSN.2001.941431